Why you do not use GRANT ALL ON *.*?

Why you do not use GRANT ALL ON *.*?

I was with a client today, and after rebooting a MySQL 5.0.22 instance cleanly with /etc/init.d/mysqld service, I observed the following error, because you always check the log file after starting MySQL.

080923 16:16:24  InnoDB: Started; log sequence number 0 406173600
080923 16:16:24 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.0.22-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distribution
080923 16:16:24 [ERROR] /usr/libexec/mysqld: Table './schema_name/table_name' is marked as crashed and should be repaired
080923 16:16:24 [Warning] Checking table:   './schema_name/table_name'

Now, I’d just added to the /etc/my.cnf a number of settings including:

myisam_recovery=FORCE,BACKUP

which explains the last line of the log file. When attempting to connect to the server via the mysql client I got the error:

“To many connections”

So now, I’m in a world of hurt, I can’t connect to the database as the ‘root’ user to observe what’s going on. I know that table it’s decided to repair is 1.4G in size, and the server is madly reading from disk. Shutting down the apache server that was connecting to the database is not expected to solve the problem, and does not, because connections must wait to timeout.

MySQL reserves a single super privileged connection, i.e. ‘root’ to the mysql server specifically for this reason, unless all the connections have this privilege. The problem, as often experienced with clients, is the permissions of the application user is simply unwarranted.

mysql> select host,user,password from mysql.user;
+-----------+-------------+------------------+
| host      | user        | password         |
+-----------+-------------+------------------+
| localhost | root        | 76bec9cc7dd32bc0 |
| xxxxxx    | root        |                  |
| xxxxxx    |             |                  |
| localhost |             |                  |
| %         | xxxxxxxxxxx | 0716d6776318d605 |
| localhost | xxxxxxxxxxx | 0716d6776318d605 |
| localhost | xxxxxxx     | 6885269c4a550a03 |
+-----------+-------------+------------------+
7 rows in set (0.00 sec)

mysql> show grants for xxxxxxx@localhost;
+---------------------------------------------------------------------------------------+
| Grants for xxxxxxx@localhost                                                          |
+---------------------------------------------------------------------------------------+
| GRANT USAGE ON *.* TO xxxxxxx'@'localhost' IDENTIFIED BY PASSWORD '6885269c4a550a03'  |
| GRANT ALL PRIVILEGES ON `xxxxxxx`.* TO 'xxxxxxx'@'localhost' WITH GRANT OPTION        |
+---------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

So the problem is ALL PRIVILEGES is granted to an application user. Never do this!

The solution is to remove all unused users, anonymous users, and create the application user with just the privileges needed.

DROP USER xxxxxxxxxxx@localhost;
DROP USER xxxxxxxxxxx@'%';

DELETE FROM mysql.user WHERE user='';
FLUSH PRIVILEGES;
DROP USER xxxxxxx@localhost;
CREATE USER xxxxxxx@localhost IDENTIFIED BY 'xxxxxxx';

GRANT SELECT,INSERT,UPDATE,DELETE ON xxxxxxx.* TOxxxxxxx@localhost;

A neat trick for a row number in a MySQL recordset

While working for a client, I had need to produce canned results of certain different criteria, recording the result in a table for later usage, and keep the position within each result.

Knowing no way to do this via a single INSERT INTO … SELECT statement, I reverted to using a MySQL Stored Procedure. For example, using a sample I_S query and the following snippet:

  ...
  DECLARE list CURSOR FOR SELECT select table_name from information_schema.tables where table_schema='INFORMATION_SCHEMA';
  DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done=TRUE;

  OPEN list;
  SET result_position = 1;
  SET done = FALSE;
  lab: LOOP
    FETCH list INTO table_name;
    IF done THEN
      CLOSE list;
      LEAVE lab;
    END IF;
    INSERT INTO  summary_table(val,pos) VALUES (table_name,result_position);
    SET result_position = result_position + 1;
  END LOOP;

However, in reviewing with another colleague after writing some 10+ different queries and SP loops, I realized that it is possible to record the position of each row in a result set using session variables, negating the need for all that code.

SET @rowcount = 0;
SELECT table_name, @rowcount := @rowcount + 1 FROM information_schema.tables WHERE table_schema = 'INFORMATION_SCHEMA';
+---------------------------------------+----------------------------+
| table_name                            | @rowcount := @rowcount + 1 |
+---------------------------------------+----------------------------+
| CHARACTER_SETS                        |                          1 |
| COLLATIONS                            |                          2 |
| COLLATION_CHARACTER_SET_APPLICABILITY |                          3 |
| COLUMNS                               |                          4 |
| COLUMN_PRIVILEGES                     |                          5 |
| ENGINES                               |                          6 |
| EVENTS                                |                          7 |
| FILES                                 |                          8 |
| GLOBAL_STATUS                         |                          9 |
| GLOBAL_VARIABLES                      |                         10 |
| KEY_COLUMN_USAGE                      |                         11 |
| PARTITIONS                            |                         12 |
| PLUGINS                               |                         13 |
| PROCESSLIST                           |                         14 |
| PROFILING                             |                         15 |
| REFERENTIAL_CONSTRAINTS               |                         16 |
| ROUTINES                              |                         17 |
| SCHEMATA                              |                         18 |
| SCHEMA_PRIVILEGES                     |                         19 |
| SESSION_STATUS                        |                         20 |
| SESSION_VARIABLES                     |                         21 |
| STATISTICS                            |                         22 |
| TABLES                                |                         23 |
| TABLE_CONSTRAINTS                     |                         24 |
| TABLE_PRIVILEGES                      |                         25 |
| TRIGGERS                              |                         26 |
| USER_PRIVILEGES                       |                         27 |
| VIEWS                                 |                         28 |
+---------------------------------------+----------------------------+
28 rows in set (0.01 sec)

Of course you need the all important SET before each query, if not specified however, the subsequent query does not result in an error, just NULL.

So all I needed was:

INSERT INTO summary_table(val,pos)
SELECT table_name, @rowcount := @rowcount + 1
FROM information_schema.tables
WHERE table_schema = 'INFORMATION_SCHEMA';

A simple and trivial solution.

DISCLAIMER:
How this performs under load, and how it is supported in different and future versions of MySQL is not determined.

Securing your OS for MySQL with JeOS

Do you have a full time System Administrator? Do you have only a part-time SA, or none at all?

Packet General’s Data Security and PCI Compliance solutions run on a dedicated appliance, based on a “Just Enough Operating System” (JeOS) to minimize exposure.

This appliance actually improves not just the security of your data, but ensures your Operating System is secure and up to date. With only 4 services and a footprint < 600MB this is an ideal solution for running even a normal MySQL installation. Security upgrades can also be provided as an automated feature, eliminating the need for this management internally.

Tomorrow in the MySQL Webinar How to secure MySQL data and achieve PCI compliance which is being held Thursday, September 11, 2008, 10:00 am PST, 1:00 pm EST, 18:00 GMT we will be discussing this in more detail.

How to secure MySQL data and achieve PCI compliance

This week I will be the moderator for a MySQL Webinar How to secure MySQL data and achieve PCI compliance being held Thursday, September 11, 2008, 10:00 am PST, 1:00 pm EST, 18:00 GMT.

Recently I wrote about Do you store credit cards in your MySQL Database?. If you do, then PCI Compliance is not something you can ignore.

This webinar will not only be discussing PCI Compliance, but also MySQL data security. Our panel includes Didier Godart from MasterCard Worldwide, one of three members who drafted the Payment Card Industry Data Security Standard 1.0.

For more information on the various PCI Compliance and Encryption options for MySQL , check out the Packet General website.

Naming standards? Singular or Plural

It’s important that for any software application good standards exist. Standards ensure a number of key considerations. Standards are necessary to enforce and provide reproducible software and to provide a level of quality in a team environment, ease of readability and consistency.

If you were going to create a MySQL Naming Standard you have to make a number of key decisions. Generally there is no true right or wrong, however my goals tend towards readability and simplicity. In 2 decades of database design I’ve actually changed my preference between some of these points.

1. Pluralism

Option 1
All database objects are defined in the logical form, that being singular.

For example: box, customer, person, category, user, order, order_line product, post, post_category

Option 2

For database tables & views, objects are defined in plural. For columns, objects are defined singular.

For example: boxes, customers, people, categories, users, orders, order_lines, products, posts, post_categories

Issues
Inconsistency between table name and column name, when using plural. Column names simply are not plural.
When the plural of the name is a completely different spelled word. For example a table of People, and a primary key of PersonId.
When the plural rule is not adding ‘s’, for example replacing ‘y’ with ‘ies’ as with Category.
Strict rule necessary for relationship and intersection tables. Generally, only the last portion is plural.
What about other objects, such as stored procedures for example.

2. Case Sensitivity

Option 1
All database objects should be specified as lowercase only. Words are separated with ‘underscores’.

For example: customer, customer_history, order, order_line, product, product_price, product_price_history

Option 2
All database objects use CamelCase. Words are separated via Case.

For example: Customer, CustomerHistory, Order, OrderLine, Product, ProductPrice, ProductPriceHistory

Issues
Some database products have restrictions here, Oracle for example, UPPERCASES all objects. MySQL allows for both, except for some Operating Systems that does not support Mixed Case properly (e.g. Microsoft)

3.Key names

In a related post I will be discussing natural and surrogate keys. For this purpose, we will assume you are using surrogate keys.

All keys will have a standard name. For Example: id, key, sgn (system global number)

When referencing primary keys and foreign keys, what standard is used?

Option 1
The primary key is the same name across all tables, making it easy to know the primary key of a table.
For foreign keys, the name is prefixed with the table name or appropriate table alias.

For Example: id. The foreign key would be customer_id, order_id, product_id

Option 2
The primary key is defined as unique across the system, and as such the foreign key is the same as the primary key.

For Example: customer_id, order_id, product_id

Issues
When self referencing columns to a table, having a standard is also appropriate, for example parent_product_id.

4. Specific Object Names

Option 1
For the given type of object, have a standard prefix or suffix.

For Example, all tables are prefixed with tbl, all views are suffixed with _view, all columns for a given table are prefixed with table alias.

Option 2
Don’t prefix or suffix an object with it’s object type.

5. Reserved Words

Option 1
Don’t use them. When a word is reserved, find a more description name.

E.g. system_user, order_date,

Option 2
Allow them.

For Example: user, date, group, order

Issues
A number of database systems do not allow the use of reserved words, or historically have not. Sum such as MySQL for example, allow reserved words, but only when additional quoting is used.

6. Abbreviations

This indeed could be a entire topic it’s self. In simplicity, do you use abbreviations other then the most common and everybody knows abbreviations (to the point you don’t know what the abbreviation actually is in real life type abbreviations), or do you not.

I use the example of ‘url’. How many people actually know what url stands for. This is a common abbreviation.

Option 1
For objects, use abbreviations when possible. Don’t use it for key tables, but for child and intersection tables.
For Example: invoice, inv_detail, inv_id,

Option 2
Avoid abbreviations at all costs.
invoice, invoice_detail, invoice_id,

Issues
Unless you have a large schema (e.g. > 500 tables) the use of abbreviations should not be needed given the relative sizes of objects in modern databases.

My Recommendations

There are a lot more considerations then these few examples for naming standards, however as with every design, it’s important to make a start and work towards continual improvement.

  1. All objects are singular (very adamant, people/person, category/categories, sheep/sheep for tables, but not columns – simplicity wins as English is complex for plurals).
  2. All objects are lowercase and use underscore ‘_’ (I really like CamelCase for readability, but for consistency and simplicity, unfortunately lowercase is easier).
  3. All primary key’s are defined as unique across the system.
  4. Don’t use prefixes/suffixes to identity object types.
  5. Never use reserved words.
  6. Don’t use abbreviations except for the most obvious.

At the end of the day, I will work with what standard is in place. What I won’t work with is, when there is no documented, accountable standard.

A 5.1 QEP nicety – Using join buffer

I was surprised to find yesterday when using MySQL 5.1.26-rc with a client I’m recommending 5.1 to, some information not seen in the EXPLAIN plan before while reviewing SQL Statements.

Using join buffer

+----+-------------+-------+--------+---------------+--------------+---------+------------------------+-------+----------------------------------------------+
| id | select_type | table | type   | possible_keys | key          | key_len | ref                    | rows  | Extra                                        |
+----+-------------+-------+--------+---------------+--------------+---------+------------------------+-------+----------------------------------------------+
|  1 | SIMPLE      | lr    | ALL    | NULL          | NULL         | NULL    | NULL                   |  1084 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | ca    | ref    | update_check  | update_check | 4       | XXXXXXXXXXXXXXXXX      |     4 | Using where; Using index                     |
|  1 | SIMPLE      | ce    | ALL    | NULL          | NULL         | NULL    | NULL                   | 13319 | Using where; Using join buffer               |
|  1 | SIMPLE      | co    | eq_ref | PRIMARY       | PRIMARY      | 4       | XXXXXXXXXXXXXXXXX      |     1 | Using where                                  |
+----+-------------+-------+--------+---------------+--------------+---------+------------------------+-------+----------------------------------------------+
4 rows in set (0.00 sec)
mysql> select version();
+-----------+
| version() |
+-----------+
| 5.1.26-rc |
+-----------+
1 row in set (0.00 sec)

Sergey Petrunia of the MySQL Optimizer team writes about this in Use of join buffer is now visible in EXPLAIN.

An intestesting approach to free hosting

I came across the OStatic Free hosting service that provide Solaris + Glassfish (Java Container) + MySQL.

They offer “Now you can get free Web hosting on Cloud Computing environment free of charge for up to 12 months.

The catch “accumulate 400 Points for participating on the site“.

A rather novel approach, you get 100 points for registering, but then only 5-15 points per task. The equates to approximately at least 30 tasks you need to perform, such as answering a question, or reviewing somebodies application. That seems like a lot of work?

In this case, Free has definitely a cost and time factor to consider.

Get linked to Drizzle

We are always looking at different ways to help promote, inform and identify contributers, users and supports for Drizzle.

One way is to join the Linked In Drizzle group (click here when logged in). You will already see a few MySQL die hards, but also the need breed of names that are part of Drizzlemania.

Choosing MySQL 5.1 over 5.0

I have been asked twice this week what version of MySQL I would choose for a new project.
As with most questions in life the answer is: It Depends?

In general I would now recommend for a new project to select 5.1, and he is why.

  1. If it’s a new project and your not managing existing applications with older versions then 5.1 is slated for General Availability (GA) at some imminent time. Having been at Release Candidate (RC) for quite some time (almost 1 year), many people, both internally and in the community are just waiting for Sun/MySQL to get this version out.
  2. MySQL 5.0 is in maintenance mode, it’s now 3 years old. MySQL is placing (I’m assuming) resourcing energies to current and future releases.
  3. If your looking at releasing a product in the next 3 months for example, you do not want to consider the testing and deployment of a new version (e.g. 5.1) in the next 6-9 months.
  4. Unless your comparing specific performance between 5.1 and 5.0 in your edge cases, for a new project start with 5.1 you should be testing and confirming performance and reliability here. The worse case is you can test in 5.0 of any specific problem.
  5. 5.1 gives you new features of course, partitioning may be of benefit but don’t assume it’s going to be a great improvement unless you applications SQL naturally tended to the MySQL partitioning strengths.
  6. The single biggest benefit is the Pluggable Storage Engine Architecture. This can give you some benefits, and in the case of transaction storage engines that are production ready, Innodb now has a pluggable version, much improved on the MySQL supplied version. There are a long list of other engines under development with relative strengths and weakness, however be wary of versions that require customized builds of MySQL.

There are some concerns where I don’t have answers? For example, if you have MySQL Support , is 5.1 supported? I know a common answer to problems in pre 5.0 versions is, have you tried upgrading to 5.1

Why is not released? This is good question, the answer is obviously a level of quality, however it is generally discussed that 5.1 is of better quality in existing features then 5.0. It is 5.1 specific features you need to be careful of. It’s important that you do read carefully the 5.1 Release Notes to see where bug fixes or compatibility changes are still occuring.

As with any choice in the Open Source world, some level of risk assessment is necessary. If you have good metrics and measurement in place for your system, and you adequately test your software, there is no reason not to now consider 5.1 as a viable alternative for new development.

As I post this I note, I see the yet unreleased 5.1.28 list of bugs still shows issues of concern.

Using consistent data types for columns

I came across this error recently when trying to modify the data type of a column.

ERROR 1025 (HY000): Error on rename of './sakila/#sql-1d91_5' to './sakila/inventory' (errno: 150)

Not the first time, and not the last time. A common problem with InnoDB tables, is the lack of information, you need to dig deeper with the following command (and appropriate security a well organized security profile will NOT have).

mysql> SHOW ENGINE INNODB STATUS;

...

------------------------
LATEST FOREIGN KEY ERROR
------------------------
080717 20:00:28 Error in foreign key constraint of table sakila/inventory:
there is no index in the table which would contain
the columns as the first columns, or the data types in the
table do not match the ones in the referenced table
or one of the ON ... SET NULL columns is declared NOT NULL. Constraint:
,
  CONSTRAINT "fk_inventory_film" FOREIGN KEY ("film_id") REFERENCES "film" ("film_id") ON UPDATE CASCADE

...

You also need to dig though the output of the command to find this, on a larger system this can be quite a lot of information just to find the details of the error. (It would be nice if there was an easier way.)

The result of this error is the columns in a foreign key relationship need to be of the same data type. This is actually a good thing, and MySQL generally operates much better in joins when the join columns are consistent.

On the assumption that you use surrogate primary keys for tables, this is a candidate for a naming standard for primary keys should be the primary key name is unique column name within your schema.

For example, if you call all your primary key’s ‘id’, your foreign key’s are normally ‘table_id’. While this is a common approach that I promoted myself for many years, it’s easy to read and consistent, actually naming every primary key uniquely provides two great benefits.

First, you can easily identify relationships in your entire schema without even knowing about the schema in detail. Second, you can leverage the benefit of the INFORMATION_SCHEMA to in the case of this post, confirm the data types are consistent for all matching columns, even when Referential Integrity is not used.

So, instead of using ‘id’, use ‘actor_id’, ‘film_id’, ‘user_id’ etc. For any self join keys, I normally prefix with parent, so ‘parent_user_id’ for example if you have a hierarchy within a table.

MySQL involvement in OSCON opening keynote

Before I get to post my OSCON reflection I see I didn’t post this (which I reference).

At OSCON opening keynotes Tim O’Reilly Interviews Monty Widenius & Brian Aker. This provided some interesting answers in a Q & A session. Here is some of the discussion.

TO: So 6 months in. How is it with Sun?
BA: Really rewarding environment. My first question was? You are going to send me free H/W. No H/W has been delivered yet, or access to the masses, still hoping. Sun is a very engineering driven company.
MW. Thanks God we didn’t go public. Starting to do closed sourced components, going public this would have continued.

TO: Sun saved MySQL from public market/ insulated from market.
MW: 6 months in, Sun still trying to figure out what they bought. Sun has made a commitment to open source throughout the organization. Engineers who have been working in closed environments, now seeing this all in public, and opens yourself up to more inputs and exposure.

TO: You have your own projects within sun, how does that affect with the main line of development of MySQL, Monty you with the Maria Storage Engine, Brian you with Drizzle.

TO: What is the Support like in Sun?
BA: My boss got it. We are looking at going after different market area, niche and ecosystem. There is certain direction the main codebase is heading such as enterprise features, oracle like replacements. There is a core set of environments what they aren’t needed. Additional new requirements like a proximity data storage, historically Postgres has been good for this type of GIS data. This is a new type of data store. location/time and proximity of objects.
Sun has given us more free hands to work for best features of MySQL. For Drizzle, to strip it down into more components architecture and extensibility. It’s a micro-kernel there will be an interface for large parts of the code.

TO: What do you think about Google?
BA: Happy opening up more of their data, and trying to turn the world into their own 20% of time project.

TO: What do you think about Amazon?
BA: Interesting position, secretive company. At the beginning how little anybody though of Amazon in a service marketplace.

TO : What do you think about Microsoft?
MW: less and less things are good.
BA: Irrelevant.

TO: What do you think about Apple?
NW: More afraid of Apple then Microsoft
BA: Really want an iPhone, but hoping Google will get Android out and it works.

TO: What are the cool things MySQL can do on the Sun field, and reverse?
BA: Both Sun and MySQL Engineers thought about open source differently. MySQL has created a set of steps of evolution, e.g. employees contributing to open source projects. MySQL’s DNA was very small, it’s interesting how fast this is influencing Sun’s approach
MW: MySQL has become to management driven in previous years, Sun has enabled us to get back to our roots.

Where the happening community people now hang

Eric of Proven Scaling commented on a lack of IRC action in the normal mysql channels today when he visited the #drizzle channel on irc.freenode.net.

ebergen: I'm still in #mysql-dev and #planet.mysql but they are hardly active these days [1:51pm]
rbradfor: ebergen: funny, #drizzle is where the action is. [1:51pm]

There is active movement on the Drizzle project. Why is this? Well, I think most importantly is that there is active contribution from the community, at least 5 different companies and more individuals are pushing code to Drizzle, and it’s being accepted and incorporated. Something you can not say about the MySQL Community branch.

As I write this, there are 35 active people on the #drizzle channel now, and 137 members of the Drizzle Discuss list.

My contribution is as Monty put’s it, “Your the build team”. I am managing the Build Master for Drizzle and my company 42SQL is providing the hosting and support. I’ve even managed to push my first small code changes to the project using the very simple Contributing Code instructions. No fuss, no pain, and I don’t care if it doesn’t get included, but it’s available for all to see and use.

In 2 days we now have 15 build slaves covering Ubuntu 8.04 32 & 64bit, Debian 32 & 64 bit, CentOS 5 64 bit, Gentoo 32 & 64 bit, and Mac OS/X 10.5, with definitely some color at times in the waterfall display.

Jay has a good article on Drizzle Buildbot Now Accepting BuildSlaves.

We need your help! There are plenty of Linux/Unix OS’s out there, and we want to know Drizzle can be compiled as broadly as possible. Most of the contributors of build slaves to date are not names I know well in the MySQL community, which is excellent. What I’d like to see is more names I do know.

It’s easy, just check out Instructions for setting up a BuildSlave for Drizzle.

Drizzle needs you


Use MySQL, but want to follow the new kid on the block?
Want to help contribute to Drizzle?

We are seeking help in compiling across different platforms.
Please help us by becoming a buildbot slave.

There are detailed instructions, so now is the time to take a few minutes and help out the project.

The Drizzle Buildbot is hosted and supported by 42SQL.

Building sources with BuildBot

Unless your in the desert under a rock (where rain is clearly needed), you will have heard of Drizzle – A Lightweight SQL Database for Cloud and Web. My company 42SQL is sponsoring the BuildBot for the Drizzle project. BuildBot is a system to automate the compile/test cycle required by most software projects to validate code changes.

Check out Installing Buildbot for what’s necessary to get a working installation. This is necessary for the Master and Slaves.

Configuration was a little more complicated then expected, due to lack of accurate documentation, and reading old docs at sourceforge. Be sure now to read here.

This is a step by step approach I used to successfully configure Drizzle Build Bot (Master and Slave).

1. Create OS User.

su -
useradd buildbot
su - buildbot

2. Create Master Installation

buildbot create-master /home/buildbot/master
cd /home/buildbot/master
cp master.cfg.sample master.cfg
vi master.cfg

Here is a diff of what simple changes I made to the master.cfg.sample work in my environment.

$ diff master.cfg.sample master.cfg
23c23
< c['slaves'] = [BuildSlave("bot1name", "bot1passwd")]
---
> c['slaves'] = [BuildSlave("centos5_64", "paSSw0rd")]
95c95
< cvsroot = ":pserver:[email protected]:/cvsroot/buildbot"
---
> cvsroot = ":pserver:[email protected]:/cvsroot/buildbot"
105c105
< f1.addStep(Trial(testpath="."))
---
> #f1.addStep(Trial(testpath="."))
108c108
< 'slavename': "bot1name",
---
>       'slavename': "centos5_64",
148c148
< #c['debugPassword'] = "debugpassword"
---
> c['debugPassword'] = "paSSw0rd"
156c156
< #                                       "admin", "password")
---
> #                                       "admin", "paSSw0rd")
166,167c166,167
< c['projectName'] = "Buildbot"
< c['projectURL'] = "http://buildbot.sourceforge.net/"
---
> c['projectName'] = "Drizzle Buildbot"
> c['projectURL'] = "https://launchpad.net/drizzle/"

Initially I’m just going to test with a CVS checkout of buildbot to confirm operations.
NOTE: The example provided pserver URL in master.cfg.sample is invalid.

3. Start Master

buildbot start /home/buildbot/master > start.log
tail -f /home/buildbot/master/twistd.log

4. Confirm Master

lynx http://drizzlebuild.42sql.com:8010

5. Create Slave

buildbot create-slave /home/buildbot/slave drizzlebuild.42sql.com:9989 centos5_64 PaSSw0rd

I got stuck here based on docs, be sure the port number is the client port.

6. Configure Slave

cd /home/buildbot/slave/info
echo "Ronald Bradford < ronald .bradford @ google mail >" > admin
echo "Drizzle CentOS 5 64bit "`uname -a` > host
cat admin host

7. Start Slave

cd /home/buildbot/slave
buildbot start /home/buildbot/slave > start.log
tail -f /home/buildbot/slave/twistd.log

8. Confirm Slave

If everything is working by the time you look at the twistd.log you will see work happening.
You can also via the web interface and see in Lasted Builds the first build is working.

9. Stop Services

To stop the BuildBot master and slave.

buildbot stop /home/buildbot/master
buildbot stop /home/buildbot/slave

10. Change Master Configuration

Should you make any changes to master.cfg the following command will re-read the configuration file.

buildbot sighup /home/buildbot/master

11. Startup
The following was added to cron

@reboot buildbot start /home/buildbot/master
@reboot buildbot start /home/buildbot/slave

The @reboot is new sytax for me, so I can’t yet confirm it’s operation.

If you want to be a build slave for Drizzle, check out Instructions here

MySQL Proxy lua scripts from presentation

The following Lua scripts are the examples are from my MySQL Proxy @ OSCON 08 presentation.

analyze_query.lua

MySQL Proxy Analyze Query.

Requires MySQL Proxy Logging Module.

What is released is the Version for MySQL 5.0. A generic version for all MySQL versions is not yet released.

histogram.lua

This script is part of the standard MySQL Proxy examples.

Other Scripts

Additional Lua scripts from MySQL forge are available here.

MySQL Proxy @ OSCON 08

Today I presented with Giuseppe Maxia of Sun Microsystems Inc at OSCON 08 on “MySQL Proxy: From Architecture to Implementation”. I was surprised to find that MySQL has a strong showing with a number of presentations this week.

Our talk covered the basics of MySQL Proxy, what’s coming in future features, and a number of examples of how I have used Proxy in consulting engagements to improve the information retrieval particularly for identifying performance problems.

Download Presentations Slides

The fast paced open source ecosystem

This morning at OSCON 08, Tim O’Reilly’s opening keynote Open Source on the O’Reilly Radar included a slide on Drizzle, giving this new project maximum exposure to the Open Source community.

Drizzle was only officially announced yesterday in Drizzle, Clouds, “What If?” by primary architect Brian Aker. Things move fast. There has been a number of comments from people yesterday including Mark Attwood, Monty Widenus,Monty Taylor,Ronald Bradford, Arjen Lentz, Lewis Cunningham, Jeremy Cole, 451 Group,Matt Asay, Assaf Arkin, SlashDot, Builder.au and MySQL HA.

The Drizzle Launchpad project has reached 5th on a Google Search.

Unfortunately, not all uptake and feedback was positive. The official Wikipedia page for Drizzle was marked for speedy deletion almost instantly, and within a few hours permanently deleted.

The new kid on the block – Drizzle

Before today, Drizzle was known as a light form of rain found in Seattle (among other places). Not any more. If you have not read the news already today, Drizzle, Clouds, “What If?” is the new kid on the RDBMS bock.

Faster, leaner and designed with the original goals of ease-of-use, reliability and performance, Drizzle will make an impact in those organizations that are seeking a viable database storage solution for large scalable applications. The key to Drizzle is several fold. First, the crud has been removed. The first part of Drizzle development is to remove bloat or non functioning software from the MySQL tree. In fact if you monitor the commits, it reads like, this has been removed, these files have been deleted, this code has been refactored, this new library has been introduced. Design decisions that have limited MySQL’s development for years are being simply cast aside.

The current landscape has become more complicated in 2008. You have the official MySQL releases, 5.0 is becoming ancient (being released almost 3 years ago), 5.1 is now clearly a lame duck, no release date for past few years, (the internal joke was 5.1 will be released in Q2, but the year is unspecified). 6.0 is in identity crisis with beta parts in alpha. These versions are moving so slowly, they are moving towards extinction like the dinosaurs. Monty Widenius is working solidly on Maria (and unofficial MySQL 5.1 branch), probably more stable and possibly released before 5.1. MySQL cluster has gone it’s own way, it was shackled by the 5.1 legacy and simply couldn’t wait for a GA product.

Jim Starkey (creator of Falcon, the new 6.0 storage engine) now is working in the clouds with Nimbus DB. Dorsal Source is lying dormant, Proven Scaling has it’s enterprise binaries and now Percona has it’s own patched ports. You have strong patches from Google and eBay that have zero hope of every being introduced into the official MySQL releases, probably until 7.x (5.1 and 6.0 have been frozen for a long time). Innodb from Oracle invested heavily in new features in a 5.1 plugin, announced at the MySQL Conference, broken in 1 day by MySQL releasing a new RC version making it in-compatible. Kickfire and Infobright have their own hacked versions, and Nitro DB I suspect have just given up waiting (now like 2 years).

With Sun’s acquisition now at T+6 months, cash and resources doesn’t appear to have helped with the official product. The single greatest movement in this period is that MySQL is now hosted under Launchpad, enabling anybody to access the source code, and even create branches like Jim Winstead reported. However I doubt you will see this helping code getting in the mainline product, but at least it will be more visible. This was an initiative long before the Sun acquisition, and indeed is against Sun policy of using Mercurial.

So why is Drizzle going to be any different or better?

You start with a committed list of contributors from already 6-7 different organizations. The clear goals of simplification, to make it faster and scale better on multi-core servers echo the work being done. You have developers who work in real world situations, not just coders for many years without experiencing operational use, and you have zero sales and marketing getting in the way. Removal of incomplete or stagnant functionality is key for the alpha version and includes stored procedures, triggers, prepared statements, query cache, extra data types, full-text, timezones etc is just the start.

Being small and nimble will enable Drizzle to develop and release code in much shorter iterations. You will see new developments allowing far greater plugin support via the new modularity approach and far better coding standards, making expert knowledge of how MySQL internals work a lesser requirement to contribute.

Will it fizzle, will it dazzle? Drizzle has the potential to be a stellar product. I’m a supporter and I hope to contribute in some small way.

References


About the Author

Ronald Bradford provides Consulting and Advisory Services in Data Architecture, Performance and Scalability for MySQL Solutions. An IT industry professional for two decades with extensive database experience in MySQL, Oracle and Ingres his expertise covers data architecture, software development, migration, performance analysis and production system implementations. His knowledge from 10 years of consulting across many industry sectors, technologies and countries has provided unique insight into being able to provide solutions to problems. For more information Contact Ronald.

An East Coast option

Within the present MySQL ecosystem, there are limited options for dedicated MySQL Consulting in the US. Outside of the official Sun/MySQL Consulting, Percona and Proven Scaling both based in Silicon valley are the only options generally known and accepted by the MySQL Community.

There is now an east coast option based in New York, and that is Ronald Bradford. Providing expert MySQL Consulting in Architecture, Performance, Scalability, Migration and Knowledge Transfer.

With two decades working in the IT industry, Ronald is well qualified in MySQL having previously provided consulting services for MySQL Inc combining 9 years experience with the product. His consulting experience is not limited to MySQL, having also worked extensively with Oracle, and previously with Ingres. More details of this experience is available at Linked In

This week you will find him on the west coast. If your at OSCON 2008, then please track me down. You can use my Contact Form, email [me] at [this domain], ping me on Twitter, track me on irc://irc.freenode.net (~arabxptyltd) or drop in to my OSCON session at 2:35pm Thursday.

Your data and the cloud

I will be speaking on July 29th in New York at an Entrepreneurs Forum on A Free Panel on Cloud Computing. With a number of experts including Hank Williams of KloudShare, Mike Nolet of AppNexus, and Hans Zaunere of New York PHP fame is should be a great event.

The focus of my presentation will be on “Extending existing applications to leverage the cloud” where I will be discussing both the advantages of the cloud, and the complexities and issues that you will encounter such as data management, data consistency, loss of control, security and latency for example.

Using traditional MySQL based applications I’ll be providing an approach that can lead to your application gaining greater power of cloud computing.


About the Author

Ronald Bradford provides Consulting and Advisory Services in Data Architecture, Performance and Scalability for MySQL Solutions. An IT industry professional for two decades with extensive database experience in MySQL, Oracle and Ingres his expertise covers data architecture, software development, migration, performance analysis and production system implementations. His knowledge from 10 years of consulting across many industry sectors, technologies and countries has provided unique insight into being able to provide solutions to problems. For more information Contact Ronald.

When (n) counts?

I have seen on many engagements the column data type is defined as INT(1).

People have the misconception that this numeric integer data type is of the length of one digit, or one byte. (One digit is 0-9 an one byte is 0-255)

This is incorrect.

Integer

For integer numeric data types in MySQL, that is TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT the (n) has no bearing on the size of data stored within the specific data type. The (n) is simply for display formatting.

In the MySQL Manual 10.2. Numeric Types you read This optional display width is used to display integer values having a width less than the width specified for the column by left-padding them with spaces. The display width does not constrain the range of values that can be stored in the column, nor the number of digits that are displayed for values having a width exceeding that specified for the column.

The following example shows the (n) in this case 3 has no effect on the size of data stored.

DROP TABLE IF EXISTS numeric_int;
CREATE TABLE numeric_int(i INT(3) NOT NULL);
INSERT INTO numeric_int VALUES (1),(22),(333),(444),(55555);
SELECT * FROM numeric_intG
i: 1
i: 22
i: 333
i: 444
i: 55555

Floating Point

When it comes to floating point precision of FLOAT and DOUBLE, the syntax of (m,n) has a different inteperation. The manual states A precision from 0 to 23 results in a four-byte single-precision FLOAT column. A precision from 24 to 53 results in an eight-byte double-precision DOUBLE column.
I will discuss this some more in a different post with some interesting findings.

And MySQL allows a non-standard syntax: FLOAT(M,D) or REAL(M,D) or DOUBLE PRECISION(M,D). Here, “(M,D)” means than values can be stored with up to M digits in total, of which D digits may be after the decimal point. For example, a column defined as FLOAT(7,4) will look like -999.9999 when displayed. MySQL performs rounding when storing values, so if you insert 999.00009 into a FLOAT(7,4) column, the approximate result is 999.0001.

So in the case of FLOAT,DOUBLE the (n) does both affect storage and presentation where it rounds the number as confirmed by the following test. Look a the last 2 rows for the rounding confirmation.

DROP TABLE IF EXISTS numeric_float;
CREATE TABLE numeric_float(f1 FLOAT(10,5)  NOT NULL);
INSERT INTO numeric_float values (1),(2.0),(3.12345),(4.123451),(5.123456);
Query OK, 5 rows affected (0.00 sec)
Records: 5  Duplicates: 0  Warnings: 0
SELECT * FROM numeric_floatG
f1: 1.00000
f1: 2.00000
f1: 3.12345
f1: 4.12345
f1: 5.12346
5 rows in set (0.01 sec)

Fixed Precision

The DECIMAL data type (NUMBER is a synonym) stores numbers to a fixed number of precision. From the manual again When declaring a DECIMAL or NUMERIC column, the precision and scale can be (and usually is) specified; for example: salary DECIMAL(5,2)
In this example, 5 is the precision and 2 is the scale. The precision represents the number of significant digits that are stored for values, and the scale represents the number of digits that can be stored following the decimal point. If the scale is 0, DECIMAL and NUMERIC values contain no decimal point or fractional part.

So in our test:

DROP TABLE IF EXISTS numeric_decimal;
CREATE TABLE numeric_decimal(f1 DECIMAL(10,5)  NOT NULL);
INSERT INTO numeric_decimal values (1),(2.0),(3.12345),(4.123451),(5.123456);
Query OK, 5 rows affected, 2 warnings (0.00 sec)
SELECT * FROM numeric_decimalG
f1: 1.00000
f1: 2.00000
f1: 3.12345
f1: 4.12345
f1: 5.12346

What is also interesting is that with a FLOAT, the rounding of a number greater then (n), produces no warnings, yet when using DECIMAL you will see warnings. These are:

INSERT INTO numeric_decimal values (1),(2.0),(3.12345),(4.123451),(5.123456);
Query OK, 5 rows affected, 2 warnings (0.00 sec)
Records: 5  Duplicates: 0  Warnings: 2

mysql> show warnings;
+-------+------+-----------------------------------------+
| Level | Code | Message                                 |
+-------+------+-----------------------------------------+
| Note  | 1265 | Data truncated for column 'f1' at row 4 |
| Note  | 1265 | Data truncated for column 'f1' at row 5 |
+-------+------+-----------------------------------------+
2 rows in set (0.00 sec)

What is also interesting is that the manual states the following When such a column is assigned a value with more digits following the decimal point than are allowed by the specified scale, the value is converted to that scale. (The precise behavior is operating system-specific, but generally the effect is truncation to the allowable number of digits.)

The number is generally truncated, buy differs per OS. In the case on Mac O/S and Linux it is rounded. The two test environments in this case where:

mysql> show variables like '%version%';
+-------------------------+------------------------------+
| Variable_name           | Value                        |
+-------------------------+------------------------------+
| protocol_version        | 10                           |
| version                 | 5.1.23-rc                    |
| version_comment         | MySQL Community Server (GPL) |
| version_compile_machine | i686                         |
| version_compile_os      | apple-darwin9.0.0b5          |
+-------------------------+------------------------------+
5 rows in set (0.01 sec)

mysql> show variables like '%version%';
+-------------------------+------------------------------+
| Variable_name           | Value                        |
+-------------------------+------------------------------+
| protocol_version        | 10                           |
| version                 | 5.1.24-rc                    |
| version_comment         | MySQL Community Server (GPL) |
| version_compile_machine | i686                         |
| version_compile_os      | redhat-linux-gnu             |
+-------------------------+------------------------------+
5 rows in set (0.41 sec)

Conclusion

So just to conclude, (n) for Integer types is for display formatting only, (m,n) for floating point will round the number at n places, while in fixed point (m,n) n will round or truncate the number.


About the Author

Ronald Bradford provides Consulting and Advisory Services in Data Architecture, Performance and Scalability for MySQL Solutions. An IT industry professional for two decades with extensive database experience in MySQL, Oracle and Ingres his expertise covers data architecture, software development, migration, performance analysis and production system implementations. His knowledge from 10 years of consulting across many industry sectors, technologies and countries has provided unique insight into being able to provide solutions to problems. For more information Contact Ronald.

References

The minimum testing for a shared disk MySQL environment

Recently I was asked to provide guidelines for testing fail over of a MySQL configuration that was provided by a hosting provider.

The first observation was the client didn’t have any technical details from the hosting provider of what the moving parts were, and also didn’t have any confirmation other then I think a verbal confirmation that it had been testing.

The first rule in using hosting, never assume. Too many times I’ve seen details from a client stating for example H/W configuration, only to audit and find out otherwise. RAID is a big one, and is generally far more complex to determine. Even for companies with internal systems I’ve seen the most simple question go unanswered. Q: How do you know your RAID is fully operational? A: Somebody will tell us? It’s really amazing to investigate on site with the client to find that RAID system is running in a degraded mode due to a disk failure and nobody knew.

It took some more digging to realize the configuration in question was with Red Hat Cluster Suite. A word of warning for any clients that use this, DO NOT USE MyISAM. I’ll leave it to the readers to ask me why.

Here is a short list I provided as the minimum requirements I’d test just to ensure the configuration was operational.

Verifying a working Red Hat Cluster Suite MySQL Environment

The MySQL Environment

The database environment consists of two MySQL database servers, configured in an active/passive mode using a shared disk storage via SAN.
For the purposes of the following procedures the active server will be known as the ‘primary’ server, and the passive server will be the ‘secondary server’.
The two physical servers for the purposes of these tests will be defined as ‘alpha’ and ‘beta’, with specific H/W that does not change during these tests.

Normal Operations

Expected Configuration under normal operations.

Primary Server

  • server is pingable
  • server accepts SSH Connection
  • MySQL service is started
  • has /data appropriately mounted
  • has assigned VIP address
  • MySQL configuration file and settings are correct

Secondary Server

  • server is pingable
  • server accepts SSH Connection
  • MySQL service IS NOT started
  • DOES NOT have /data mounted
  • DOES NOT has assigned VIP address
  • MySQL configuration file is not available

1. Reboot servers ‘alpha’ and ‘beta’.

Test Status:

  • alpha server is the designated primary server
  • alpha and beta servers are operational

Action:
1.1 Restart alpha server (init 6)
1.2 Restart beta server (init 6)

Checklist:
1.3 Alpha server matches primary server configuration
1.4 Beta server matches secondary server configuration

2. Controlled fail over from ‘alpha’ to ‘beta’

Test Status:

  • alpha server is the designated primary server
  • alpha and beta servers are operational

Action:
2.1 Alpha server – Instigate Cluster failover (clusvcadm -r mysql-svc)

Checklist:
2.2 Beta server matches primary server configuration
2.3 Alpha server matches secondary server configuration

3. Controlled failover from ‘beta’ to ‘alpha’

Test Status:

  • beta server is the designated primary server
  • alpha and beta servers are operational

Action:
3.1 beta server – Instigate Cluster failover (clusvcadm -r mysql-svc)

Checklist:
3.2 Alpha server matches primary server configuration
3.3 Beta server matches secondary server configuration

Exception Operations

4. Loss of connectivity to primary server

Test Status:

  • alpha server is the designated primary server
  • beta server is online

Action:
4.1 Stop networking services on ‘alpha’ (ifdown bond0)

Checklist:
4.2 Monitoring detects and reports connectively loss
4.3 Automated failover occurs
4.4 Beta server matches primary server configuration
4.5 Alpha server matches secondary server configuration

5. Restore connectivity to secondary server

Test Status:

  • beta server is the designated primary server
  • alpha server is online, but not accessible via private IP

Action:
4.1 Start networking services on ‘alpha’ (ifup bond0)

Checklist:
5.2 Monitoring detects and reports connectively restored
5.3 No failback occurs
5.4 Beta server matches primary server configuration
5.5 Alpha server matches secondary server configuration

6. Loss of connectivity to secondary server

Test Status:

  • beta server is the designated primary server
  • alpha server is online

Action:
6.1 Stop networking services on ‘alpha’ (ifdown bond0)

Checklist:
6.2 Monitoring detects and reports connectively lost
6.3 No failback occurs
6.4 Beta server matches primary server configuration
6.5 Alpha server matches secondary server configuration

7. Restore connectivity to secondary server

Test Status:

  • beta server is the designated primary server
  • alpha server is online, but not accessible via private IP

Action:
7.1 Start networking services on ‘alpha’ (ifup bond0)

Checklist:
7.2 Monitoring detects and reports connectively restored
7.3 No failback occurs
7.4 Beta server matches primary server configuration
7.5 Alpha server matches secondary server configuration

8. Power down secondary server

Test Status:

  • beta server is the designated primary server
  • alpha server is online

Action:
8.1 Power down alpha (init 0) NOTE: Need remote boot capabilities

Checklist:
8.2 Monitoring detects and reports connectively lost
8.3 Beta server matches primary server configuration
8.4 Additional paging for extended down time for ‘degraded support for failover’

9. Loss of connectivity to primary server

Test Status:

  • beta server is the designated primary server
  • alpha server is offline

Action:
9.1 Power down beta (init 0) NOTE: Need remote boot capabilities

Checklist:
9.2 Monitoring detects and reports connectively lost
9.3 Site database connectively completely unavailable
9.4 Additional paging for loss of HA solution

10. power restored to secondary server
Test Status:

  • alpha server is offline
  • beta server is offline

Action:
10.1 Power on alpha

Checklist:
10.2 Monitoring detects and reports server up
10.3 Alpha server assumes primary role (previously it was beta)
10.4 Alpha server matching primary server configuration
10.5 Addition paging for degraded HA

11. power restored to secondary server

Test Status:

  • alpha server is primary server
  • beta server is offline

Action:
11.1 Power on beta

Checklist:
11.2 Monitoring detects and reports server up
11.3 Alpha server matching primary server configuration
11.4 Beta server matching secondary server configuration

Database Operations

12. MySQL services on primary server go offline

Test Status:

  • alpha server is the designated primary server
  • beta server is online

Action:
12.1 Stop mysql services on ‘alpha’ (/etc/init.d/mysqld stop)

Checklist:
12.2 Monitoring detects and reports database loss (while connectivity is still available)
12.3 Automated failover occurs
12.4 Beta server matches primary server configuration
12.5 Alpha server matches secondary server configuration

13. MySQL services on secondary server go offline

Test Status:

  • beta server is the designated primary server
  • alpha server is online

Action:
13.1 stop mysql services on ‘beta’ (/etc/init.d/mysqld stop)

Checklist:
13.2 Monitoring detects and reports database loss (while connectivity is still available)
13.3 Automated failover occurs
13.4 Alpha server matches primary server configuration
13.5 beta server matches secondary server configuration

14. Load Testing during failure

Test Status:

  • alpha server is the designated primary server
  • beta server is online

Action:
14.1 Agressive load testing against database server
14.2 MySQL killed without prejudice (killall -9 mysqld_safe mysql)

Checklist:
14.3 Monitoring detects and reports mysql service loss
14.4 Automated failover occurs
14.5 Beta server matches primary server configuration
14.6 Alpha server matches secondary server configuration
14.7 Beta mysql logs shows a forced MySQL Recovery in logs

15. Forced Recovery

Test Status:

  • alpha server is the designated primary server
  • beta server is online

Action:
15.1 Manual full database backup is done (in case recovery does not work). Hosting Provider not told of this.
15.2 Dummy new table/schema is created (used as verification point)
15.3 Database on alpha primary server is dropped
15.4 Hosting Provider is notified stating a full database recovery including Point In time to just before drop (no time given, only command that was run)

Checklist:
15.5 Site is marked as unavailable
15.6 Hosting Provider restore data from backup and recover to point in time
15.7 Confirmation that new table/schema is restored, and full schema is available
15.8 Site is made available
15.9 Record of time for full disaster is recorded

Conclusion

This is not an exhaustive test, in fact it is just a documented approach for consideration to show a client what the minimum testing should be. As no dry run actually occurred, there may be inaccuracies and additions necessary to this document when first executed. I would need access to an appropriate configuration in order to perform a level of testing to complete this document.


About the Author

Ronald Bradford provides Consulting and Advisory Services in Data Architecture, Performance and Scalability for MySQL Solutions. An IT industry professional for two decades with extensive database experience in MySQL, Oracle and Ingres his expertise covers data architecture, software development, migration, performance analysis and production system implementations. His knowledge from 10 years of consulting across many industry sectors, technologies and countries has provided unique insight into being able to provide solutions to problems. For more information Contact Ronald.

BIGINT v INT. Is there a big deal?

The answer is yes.

In this face off we have two numeric MySQL data types, both Integer. In fact MySQL has 9 different numeric data types for integer, fixed precision and floating point numbers, however we are just going to focus on two, BIGINT and INT. This design consideration is part of my recent presentation Top 20 Design Tips for Data Architects.

What is the difference?
We turn to the MySQL Reference Manual first, in 10.1.1. Overview of Numeric Types we see the following.


INT[(M)] [UNSIGNED] [ZEROFILL]

A normal-size integer. The signed range is -2147483648 to 2147483647. The unsigned range is 0 to 4294967295.

BIGINT[(M)] [UNSIGNED] [ZEROFILL]

A large integer. The signed range is -9223372036854775808 to 9223372036854775807. The unsigned range is 0 to 18446744073709551615.

Ok, well an INT can store a value to 2.1 Billion, and an a BIGINT can store a value to some larger number to 20 digits. That MySQL search didn’t help much with details, we have to dig deeper to find 10.2. Numeric Types in which we find that INT is a 4 byte integer, and a BIGINT is an 8 byte integer.

So what’s the big deal?

Quite a lot actually. Using INT rather then BIGINT can make a significant reduction in disk space. Just this one change alone can save you 10%-20% (depends on your particular situation). More significantly, when used as a primary key, and for foreign keys and indexes, reducing your index size could be 50%, and this will improve performance when these indexes are used.

My approach is this. Let’s just focus on primary keys and foreign keys to begin with. Are you going to store more then 2.1 Billion rows in your table? The answer should be no? Should you say yes, then you do have grand plans, but you are also failing to consider the ramifications of handling larger data sets (a topic for later discussion).

There are exceptions to this rule, if you do a huge number of inserts and deletes, then while you may not have 2.1 Billion rows, you may have done 2.1 Billion inserts. Again better design practices should be considered in this case.

The Test

As with everything, we need some evidence to stake the claim. Using the Sakila sample database.

We start with a simple intersection table, that has a high number of numeric only columns. This will show the best case situation.

We will create two tables, one with all BIGINT columns, and one with all INT columns and then compare the size. These tables are only small, but they show the proportion of savings of disk space.

CREATE TABLE inventory_bigint LIKE inventory;
ALTER TABLE inventory_bigint
  MODIFY inventory_id  BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  MODIFY film_id BIGINT UNSIGNED NOT NULL,
  MODIFY store_id BIGINT UNSIGNED NOT NULL;
INSERT INTO inventory_bigint SELECT * from inventory;
CREATE TABLE inventory_int LIKE inventory;
ALTER TABLE inventory_int
  MODIFY inventory_id  INT UNSIGNED NOT NULL AUTO_INCREMENT,
  MODIFY film_id INT UNSIGNED NOT NULL,
  MODIFY store_id INT UNSIGNED NOT NULL;
INSERT INTO inventory_int SELECT * from inventory;

select table_name,engine,row_format, table_rows, avg_row_length,
        (data_length+index_length)/1024/1024 as total_mb,
         (data_length)/1024/1024 as data_mb,
         (index_length)/1024/1024 as index_mb
from information_schema.tables
where table_schema='sakila'
and   table_name LIKE 'inventory%'
order by 6 desc;
+------------------+--------+------------+------------+----------------+-------------+-------------+-------------+
| table_name       | engine | row_format | table_rows | avg_row_length | total_mb    | data_mb     | index_mb    |
+------------------+--------+------------+------------+----------------+-------------+-------------+-------------+
| inventory_bigint | InnoDB | Compact    |     293655 |             51 | 43.60937500 | 14.51562500 | 29.09375000 |
| inventory_int    | InnoDB | Compact    |     293715 |             37 | 29.54687500 | 10.51562500 | 19.03125000 |
| inventory        | InnoDB | Compact    |     293707 |             33 | 22.54687500 |  9.51562500 | 13.03125000 |
+------------------+--------+------------+------------+----------------+-------------+-------------+-------------+
3 rows in set (0.15 sec)

In this example, the data portion decreased from 14MB to 10MB or 28%, and the index portion from 29M to 19M or 34%.

CREATE TABLE customer_bigint LIKE customer;
ALTER TABLE customer_bigint
     MODIFY customer_id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
     MODIFY store_id BIGINT UNSIGNED NOT NULL,
     MODIFY address_id BIGINT UNSIGNED NOT NULL,
     MODIFY active BIGINT UNSIGNED NOT NULL;

CREATE TABLE customer_int LIKE customer;
ALTER TABLE customer_int
     MODIFY customer_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
     MODIFY store_id INT UNSIGNED NOT NULL,
     MODIFY address_id INT UNSIGNED NOT NULL,
     MODIFY active INT UNSIGNED NOT NULL;

select table_name,engine,row_format, table_rows, avg_row_length,
        (data_length+index_length)/1024/1024 as total_mb,
         (data_length)/1024/1024 as data_mb,
         (index_length)/1024/1024 as index_mb
from information_schema.tables
where table_schema='sakila'
and   table_name LIKE 'customer%'
order by 6 desc;

+-----------------+--------+------------+------------+----------------+-------------+-------------+-------------+
| table_name      | engine | row_format | table_rows | avg_row_length | total_mb    | data_mb     | index_mb    |
+-----------------+--------+------------+------------+----------------+-------------+-------------+-------------+
| customer_bigint | InnoDB | Compact    |     154148 |            139 | 37.09375000 | 20.54687500 | 16.54687500 |
| customer_int    | InnoDB | Compact    |     151254 |            121 | 30.06250000 | 17.51562500 | 12.54687500 |
| customer        | InnoDB | Compact    |      37684 |            125 |  7.81250000 |  4.51562500 |  3.29687500 |
| customer_list   | NULL   | NULL       |       NULL |           NULL |        NULL |        NULL |        NULL |
+-----------------+--------+------------+------------+----------------+-------------+-------------+-------------+
4 rows in set (0.22 sec)

In this example, the data portion decreased from 20MB to 17MB or 15%, and the index portion from 16M to 12M or 25%.

NOTE: The sample data set was increased for this example.

Conclusion

Even with these simple tables and small data sets it’s clear that INT is a saving of diskspace over BIGINT. In many clients I’ve seen huge savings in multi TB databases, just with a small number of schema optimizations. If this saving alone for a more optimized database design was only 10%, it is an easy 10% that will reflect a direct improvement in performance.


About the Author

Ronald Bradford provides Consulting and Advisory Services in Data Architecture, Performance and Scalability for MySQL Solutions. An IT industry professional for two decades with extensive database experience in MySQL, Oracle and Ingres his expertise covers data architecture, software development, migration, performance analysis and production system implementations. His knowledge from 10 years of consulting across many industry sectors, technologies and countries has provided unique insight into being able to provide solutions to problems. For more information Contact Ronald.

References

Off to OSCON

I will be heading to my first OSCON next week where I will be presenting MySQL Proxy: from Architecture to Implementation in conjunction with Giuseppe Maxia .

As was written by Colin Charles Our booth is yours… Sun at OSCON, Sun/MySQL would appear to also have a reasonable turnout. So it will be good to see some old colleagues and friends, and hopefully meet some new contacts.

While I am based on the East Coast, I do also provide expert MySQL consulting for clients in any location. Should you like to find out more about my offerings covering Architecture, Performance, Scaling, Migration and Knowledge Transfer for MySQL Solutions, please Contact Me and I will arrange a time to meet next week.

Why SQL_MODE is important? Part I

MySQL pre version 5.0 was very lax in it’s management of valid data. It was easy for data integrity to be abused if you knew how. The most common examples were truncations and silent conversions that if not understood could provide a serious data integrity issue.

In version 5.0, the introduction of SQL_MODE solved this problem. We will look at one example of how SQL_MODE can be enabled to provided improved data integrity.

You want to store the individual RGB (red/green/blue) decimal values of colors in a table. Each of these has a range from 0 to 255. You read that you can store 255 values in a TINYINT Integer data type, so you create a table like:

DROP TABLE IF EXISTS color_to_decimal;
CREATE TABLE color_to_decimal(
name VARCHAR(20) NOT NULL PRIMARY KEY,
red    TINYINT NOT NULL,
green TINYINT NOT NULL,
blue   TINYINT NOT NULL);

You insert some data like:

INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('white',255,255,255);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('black',0,0,0);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('red',255,0,0);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('green',0,255,0);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('blue',0,0,255);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('yellow',255,255,0);

Great, but when you look at you data you get?

SELECT     name, red, green, blue
FROM       color_to_decimal
ORDER BY name;
+--------+-----+-------+------+
| name   | red | green | blue |
+--------+-----+-------+------+
| black  |   0 |     0 |    0 |
| blue   |   0 |     0 |  127 |
| green  |   0 |   127 |    0 |
| red    | 127 |     0 |    0 |
| white  | 127 |   127 |  127 |
| yellow | 127 |   127 |    0 |
+--------+-----+-------+------+
6 rows in set (0.01 sec)

What happened, you delete the data and re-insert only to find no changes. You have been the victim of a silent conversion, via a means of truncation.

The TINYINT data type is 1 byte (8 bits). 8 bits can store the values from 0 to 255. When you use this integer data type, only 7 bits are actually available, which gives the range of 0 to 127. Why? Because MySQL reserved one bit for the sign, either positive or negative, even though you didn’t want a sign.

So knowing this, you go back and recreate your table with the following definition.

DROP TABLE IF EXISTS color_to_decimal;
CREATE TABLE color_to_decimal(
name VARCHAR(20) NOT NULL PRIMARY KEY,
red    TINYINT UNSIGNED NOT NULL,
green TINYINT UNSIGNED NOT NULL,
blue   TINYINT UNSIGNED NOT NULL);

You load your data and look at it again, and you see.

+--------+-----+-------+------+
| name   | red | green | blue |
+--------+-----+-------+------+
| black  |   0 |     0 |    0 |
| blue   |   0 |     0 |  255 |
| green  |   0 |   255 |    0 |
| red    | 255 |     0 |    0 |
| white  | 255 |   255 |  255 |
| yellow | 255 |   255 |    0 |
+--------+-----+-------+------+
6 rows in set (0.00 sec)

But, should you have been told about this, should there have been an error. Well, in MySQL this is actually a warning, and most applications never support and cater for warnings. It is only when you use the MySQL client program, as in these examples, you are given an indication, with the following line after each insert. If you look closely.

mysql> INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('blue',0,0,255);
Query OK, 1 row affected, 1 warning (0.00 sec)


However there is a savior for this situation, and that is SQL_MODE.

When set to the setting TRADITIONAL, an error and not a warning is generated, and most applications support catching errors. Look at what happens in our example using the original table.

SET SQL_MODE=TRADITIONAL;
DROP TABLE IF EXISTS color_to_decimal;
CREATE TABLE color_to_decimal(
name VARCHAR(20) NOT NULL PRIMARY KEY,
red    TINYINT NOT NULL,
green TINYINT NOT NULL,
blue   TINYINT NOT NULL);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('white',255,255,255);
ERROR 1264 (22003): Out of range value for column 'red' at row 1


As an added benefit you get this data integrity for free. We didn’t test it, because we know the data coming in is in the range of 0-255, but what if the user entered 500 for example. Let’s see.

TRUNCATE TABLE color_to_decimal;
SET SQL_MODE='';
INSERT INTO color_to_decimal (name, red, green, blue) VALUES('a bad color',500,0,0);
SELECT name, red, green, blue FROM color_to_decimal;
SET SQL_MODE=TRADITIONAL;
INSERT INTO color_to_decimal (name, red, green, blue) VALUES('a bad color',500,0,0);

Looking closely at the response in the client.

mysql> TRUNCATE TABLE color_to_decimal;
Query OK, 0 rows affected (0.02 sec)

mysql> SET SQL_MODE='';
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO color_to_decimal (name, red, green, blue) VALUES('a bad color',500,0,0);
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> SELECT name, red, green, blue FROM color_to_decimal;
+-------------+-----+-------+------+
| name        | red | green | blue |
+-------------+-----+-------+------+
| a bad color | 127 |     0 |    0 |
+-------------+-----+-------+------+
1 row in set (0.00 sec)

mysql> SET SQL_MODE=TRADITIONAL;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO color_to_decimal (name, red, green, blue) VALUES('a bad color',500,0,0);
ERROR 1264 (22003): Out of range value for column 'red' at row 1

As discussed in my presentation Top 20 design tips for Data Architects the UNSIGNED column construct should be always defined unless there is a specific reason not to.

In some respects I would argue that the default for an Integer column should be actually UNSIGNED, and that SIGNED should be specified when you want a sign. Most integer columns generally in schema’s only contain positive numbers. There are of course plenty of examples, positional geo data, financial data, medical data for example.

One could also argue that MySQL should make the default SQL_MODE at least TRADITIONAL, and that only when you want backward compatibility should you then change the SQL_MODE.

This is Part I on SQL_MODE, there are few more interesting cases to discuss at a later time.


About the Author

Ronald Bradford provides Consulting and Advisory Services in Data Architecture, Performance and Scalability for MySQL Solutions. An IT industry professional for two decades with extensive database experience in MySQL, Oracle and Ingres his expertise covers data architecture, software development, migration, performance analysis and production system implementations. His knowledge from 10 years of consulting across many industry sectors, technologies and countries has provided unique insight into being able to provide solutions to problems. For more information Contact Ronald.

References

Sun Stock Prices

Sun Microsystem’s (NASDAQ:JAVA) hit a low this week of $8.71. There was a stronger rally and a close at $9.16 today. The financial times reports Sun Micro chief sees rays of hope, and Bloomberg Sun Rises After Fourth-Quarter Profit Tops Estimates.

I cashed out in March at $16.32, so that’s like a 50% drop in share price. I was lucky having been at MySQL long enough to have options to vest. Newer employees are not that lucky. I certainly hope MySQL Sun Employees get the Q4 weighted bonuses. (A structure I didn’t believe compensated with the old bonus structure).

I have been following more closely since Matt Asay’s comments in Who is buying Sun?



Image courtesy of Google Financial’s.

Auditing your MySQL Data – Part 2

Continuing from my earlier post Auditing your MySQL Data, Roland has accurately highlighted that my initial post leaves out some important information for auditing. As the original charter was only to keep a history, for the purpose of comparing certain columns, a history was all that was needed.

Providing a history of changes forms the basis of auditing, and in keeping with my post title and intended follow-up, this is the all important second part. However in order to provide true auditing additional information is necessary. This includes:

  1. When was an operation performed
  2. What operation was performed, i.e. INSERT, UPDATE and DELETE
  3. Who performed the operation

Date and operation can be determined via the database, but in order to gather all this information, interaction with the application is necessary to obtain the true user information (This can’t be determined via a trigger)

The issue becomes a greater need for design understanding. What is the purpose of the audit data? How will it be accessed? How complex in maintaining the data do you wish to consider?

One alternative is keep a separate log of audit history. The benefits are a clear and easy way to provide a history of a users’ actions, and can preserve the structure of database table between the base and audit table can remain the same, triggers can remain relatively simply. However if you want to look at the data with audit history, it is better to embed these columns within each table, and triggers have to be customized and maintained in more detail the my original post.

When considering the progression of these points, the design process normally returns to the following conclusion. The following columns are added to the base table.

  • A create_timestamp column is added
  • A last_update_timestamp column is added
  • A last_update_user_id column is added

The create_timestamp is optional from an auditing perspective, because the last_update_timestamp of the first audit row will contain the same value, however experience has shown this column is valuable for other design considerations.

The only remaining issue is the type of operation, INSERT,UPDATE & DELETE. Both INSERT and UPDATE can be inferred, DELETE can not. To maintain the simplicity model, a common approach is to use a BEFORE DELETE trigger to insert an audit record with all the same values of the previous row, with the last_update_timestamp manually set. DELETE can then be determined via a no difference in any updated values.

It ultimate conclusion comes down your application design and needs. For example, your design for example may include a flag or row status for example to indicate deletes which are later cleaned up via a batch process so you don’t really care about the date/time of the actual purging of data. This then negates the need for any DELETE trigger.

Again, thanks to Roland for providing a link to Putting the MySQL information_schema to Use which provides a number of SQL statements that help in the generation of Triggers to support full auditing.

You should be aware that CURRENT_USER normally serves zero purpose if all changes are made via an application user.

At this time, you also have another design consideration. Do you introduce a procedure to re-create the triggers via an automated means for each schema change, or do you manually maintain triggers with schema changes. With each approach, additional checking and verification is necessary to ensure your triggers are correctly configured.