The fast paced open source ecosystem

This morning at OSCON 08, Tim O’Reilly’s opening keynote Open Source on the O’Reilly Radar included a slide on Drizzle, giving this new project maximum exposure to the Open Source community.

Drizzle was only officially announced yesterday in Drizzle, Clouds, “What If?” by primary architect Brian Aker. Things move fast. There has been a number of comments from people yesterday including Mark Attwood, Monty Widenus,Monty Taylor,Ronald Bradford, Arjen Lentz, Lewis Cunningham, Jeremy Cole, 451 Group,Matt Asay, Assaf Arkin, SlashDot, Builder.au and MySQL HA.

The Drizzle Launchpad project has reached 5th on a Google Search.

Unfortunately, not all uptake and feedback was positive. The official Wikipedia page for Drizzle was marked for speedy deletion almost instantly, and within a few hours permanently deleted.

The new kid on the block – Drizzle

Before today, Drizzle was known as a light form of rain found in Seattle (among other places). Not any more. If you have not read the news already today, Drizzle, Clouds, “What If?” is the new kid on the RDBMS bock.

Faster, leaner and designed with the original goals of ease-of-use, reliability and performance, Drizzle will make an impact in those organizations that are seeking a viable database storage solution for large scalable applications. The key to Drizzle is several fold. First, the crud has been removed. The first part of Drizzle development is to remove bloat or non functioning software from the MySQL tree. In fact if you monitor the commits, it reads like, this has been removed, these files have been deleted, this code has been refactored, this new library has been introduced. Design decisions that have limited MySQL’s development for years are being simply cast aside.

The current landscape has become more complicated in 2008. You have the official MySQL releases, 5.0 is becoming ancient (being released almost 3 years ago), 5.1 is now clearly a lame duck, no release date for past few years, (the internal joke was 5.1 will be released in Q2, but the year is unspecified). 6.0 is in identity crisis with beta parts in alpha. These versions are moving so slowly, they are moving towards extinction like the dinosaurs. Monty Widenius is working solidly on Maria (and unofficial MySQL 5.1 branch), probably more stable and possibly released before 5.1. MySQL cluster has gone it’s own way, it was shackled by the 5.1 legacy and simply couldn’t wait for a GA product.

Jim Starkey (creator of Falcon, the new 6.0 storage engine) now is working in the clouds with Nimbus DB. Dorsal Source is lying dormant, Proven Scaling has it’s enterprise binaries and now Percona has it’s own patched ports. You have strong patches from Google and eBay that have zero hope of every being introduced into the official MySQL releases, probably until 7.x (5.1 and 6.0 have been frozen for a long time). Innodb from Oracle invested heavily in new features in a 5.1 plugin, announced at the MySQL Conference, broken in 1 day by MySQL releasing a new RC version making it in-compatible. Kickfire and Infobright have their own hacked versions, and Nitro DB I suspect have just given up waiting (now like 2 years).

With Sun’s acquisition now at T+6 months, cash and resources doesn’t appear to have helped with the official product. The single greatest movement in this period is that MySQL is now hosted under Launchpad, enabling anybody to access the source code, and even create branches like Jim Winstead reported. However I doubt you will see this helping code getting in the mainline product, but at least it will be more visible. This was an initiative long before the Sun acquisition, and indeed is against Sun policy of using Mercurial.

So why is Drizzle going to be any different or better?

You start with a committed list of contributors from already 6-7 different organizations. The clear goals of simplification, to make it faster and scale better on multi-core servers echo the work being done. You have developers who work in real world situations, not just coders for many years without experiencing operational use, and you have zero sales and marketing getting in the way. Removal of incomplete or stagnant functionality is key for the alpha version and includes stored procedures, triggers, prepared statements, query cache, extra data types, full-text, timezones etc is just the start.

Being small and nimble will enable Drizzle to develop and release code in much shorter iterations. You will see new developments allowing far greater plugin support via the new modularity approach and far better coding standards, making expert knowledge of how MySQL internals work a lesser requirement to contribute.

Will it fizzle, will it dazzle? Drizzle has the potential to be a stellar product. I’m a supporter and I hope to contribute in some small way.

References


About the Author

Ronald Bradford provides Consulting and Advisory Services in Data Architecture, Performance and Scalability for MySQL Solutions. An IT industry professional for two decades with extensive database experience in MySQL, Oracle and Ingres his expertise covers data architecture, software development, migration, performance analysis and production system implementations. His knowledge from 10 years of consulting across many industry sectors, technologies and countries has provided unique insight into being able to provide solutions to problems. For more information Contact Ronald.

An East Coast option

Within the present MySQL ecosystem, there are limited options for dedicated MySQL Consulting in the US. Outside of the official Sun/MySQL Consulting, Percona and Proven Scaling both based in Silicon valley are the only options generally known and accepted by the MySQL Community.

There is now an east coast option based in New York, and that is Ronald Bradford. Providing expert MySQL Consulting in Architecture, Performance, Scalability, Migration and Knowledge Transfer.

With two decades working in the IT industry, Ronald is well qualified in MySQL having previously provided consulting services for MySQL Inc combining 9 years experience with the product. His consulting experience is not limited to MySQL, having also worked extensively with Oracle, and previously with Ingres. More details of this experience is available at Linked In

This week you will find him on the west coast. If your at OSCON 2008, then please track me down. You can use my Contact Form, email [me] at [this domain], ping me on Twitter, track me on irc://irc.freenode.net (~arabxptyltd) or drop in to my OSCON session at 2:35pm Thursday.

Your data and the cloud

I will be speaking on July 29th in New York at an Entrepreneurs Forum on A Free Panel on Cloud Computing. With a number of experts including Hank Williams of KloudShare, Mike Nolet of AppNexus, and Hans Zaunere of New York PHP fame is should be a great event.

The focus of my presentation will be on “Extending existing applications to leverage the cloud” where I will be discussing both the advantages of the cloud, and the complexities and issues that you will encounter such as data management, data consistency, loss of control, security and latency for example.

Using traditional MySQL based applications I’ll be providing an approach that can lead to your application gaining greater power of cloud computing.


About the Author

Ronald Bradford provides Consulting and Advisory Services in Data Architecture, Performance and Scalability for MySQL Solutions. An IT industry professional for two decades with extensive database experience in MySQL, Oracle and Ingres his expertise covers data architecture, software development, migration, performance analysis and production system implementations. His knowledge from 10 years of consulting across many industry sectors, technologies and countries has provided unique insight into being able to provide solutions to problems. For more information Contact Ronald.

When (n) counts?

I have seen on many engagements the column data type is defined as INT(1).

People have the misconception that this numeric integer data type is of the length of one digit, or one byte. (One digit is 0-9 an one byte is 0-255)

This is incorrect.

Integer

For integer numeric data types in MySQL, that is TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT the (n) has no bearing on the size of data stored within the specific data type. The (n) is simply for display formatting.

In the MySQL Manual 10.2. Numeric Types you read This optional display width is used to display integer values having a width less than the width specified for the column by left-padding them with spaces. The display width does not constrain the range of values that can be stored in the column, nor the number of digits that are displayed for values having a width exceeding that specified for the column.

The following example shows the (n) in this case 3 has no effect on the size of data stored.

DROP TABLE IF EXISTS numeric_int;
CREATE TABLE numeric_int(i INT(3) NOT NULL);
INSERT INTO numeric_int VALUES (1),(22),(333),(444),(55555);
SELECT * FROM numeric_intG
i: 1
i: 22
i: 333
i: 444
i: 55555

Floating Point

When it comes to floating point precision of FLOAT and DOUBLE, the syntax of (m,n) has a different inteperation. The manual states A precision from 0 to 23 results in a four-byte single-precision FLOAT column. A precision from 24 to 53 results in an eight-byte double-precision DOUBLE column.
I will discuss this some more in a different post with some interesting findings.

And MySQL allows a non-standard syntax: FLOAT(M,D) or REAL(M,D) or DOUBLE PRECISION(M,D). Here, “(M,D)” means than values can be stored with up to M digits in total, of which D digits may be after the decimal point. For example, a column defined as FLOAT(7,4) will look like -999.9999 when displayed. MySQL performs rounding when storing values, so if you insert 999.00009 into a FLOAT(7,4) column, the approximate result is 999.0001.

So in the case of FLOAT,DOUBLE the (n) does both affect storage and presentation where it rounds the number as confirmed by the following test. Look a the last 2 rows for the rounding confirmation.

DROP TABLE IF EXISTS numeric_float;
CREATE TABLE numeric_float(f1 FLOAT(10,5)  NOT NULL);
INSERT INTO numeric_float values (1),(2.0),(3.12345),(4.123451),(5.123456);
Query OK, 5 rows affected (0.00 sec)
Records: 5  Duplicates: 0  Warnings: 0
SELECT * FROM numeric_floatG
f1: 1.00000
f1: 2.00000
f1: 3.12345
f1: 4.12345
f1: 5.12346
5 rows in set (0.01 sec)

Fixed Precision

The DECIMAL data type (NUMBER is a synonym) stores numbers to a fixed number of precision. From the manual again When declaring a DECIMAL or NUMERIC column, the precision and scale can be (and usually is) specified; for example: salary DECIMAL(5,2)
In this example, 5 is the precision and 2 is the scale. The precision represents the number of significant digits that are stored for values, and the scale represents the number of digits that can be stored following the decimal point. If the scale is 0, DECIMAL and NUMERIC values contain no decimal point or fractional part.

So in our test:

DROP TABLE IF EXISTS numeric_decimal;
CREATE TABLE numeric_decimal(f1 DECIMAL(10,5)  NOT NULL);
INSERT INTO numeric_decimal values (1),(2.0),(3.12345),(4.123451),(5.123456);
Query OK, 5 rows affected, 2 warnings (0.00 sec)
SELECT * FROM numeric_decimalG
f1: 1.00000
f1: 2.00000
f1: 3.12345
f1: 4.12345
f1: 5.12346

What is also interesting is that with a FLOAT, the rounding of a number greater then (n), produces no warnings, yet when using DECIMAL you will see warnings. These are:

INSERT INTO numeric_decimal values (1),(2.0),(3.12345),(4.123451),(5.123456);
Query OK, 5 rows affected, 2 warnings (0.00 sec)
Records: 5  Duplicates: 0  Warnings: 2

mysql> show warnings;
+-------+------+-----------------------------------------+
| Level | Code | Message                                 |
+-------+------+-----------------------------------------+
| Note  | 1265 | Data truncated for column 'f1' at row 4 |
| Note  | 1265 | Data truncated for column 'f1' at row 5 |
+-------+------+-----------------------------------------+
2 rows in set (0.00 sec)

What is also interesting is that the manual states the following When such a column is assigned a value with more digits following the decimal point than are allowed by the specified scale, the value is converted to that scale. (The precise behavior is operating system-specific, but generally the effect is truncation to the allowable number of digits.)

The number is generally truncated, buy differs per OS. In the case on Mac O/S and Linux it is rounded. The two test environments in this case where:

mysql> show variables like '%version%';
+-------------------------+------------------------------+
| Variable_name           | Value                        |
+-------------------------+------------------------------+
| protocol_version        | 10                           |
| version                 | 5.1.23-rc                    |
| version_comment         | MySQL Community Server (GPL) |
| version_compile_machine | i686                         |
| version_compile_os      | apple-darwin9.0.0b5          |
+-------------------------+------------------------------+
5 rows in set (0.01 sec)

mysql> show variables like '%version%';
+-------------------------+------------------------------+
| Variable_name           | Value                        |
+-------------------------+------------------------------+
| protocol_version        | 10                           |
| version                 | 5.1.24-rc                    |
| version_comment         | MySQL Community Server (GPL) |
| version_compile_machine | i686                         |
| version_compile_os      | redhat-linux-gnu             |
+-------------------------+------------------------------+
5 rows in set (0.41 sec)

Conclusion

So just to conclude, (n) for Integer types is for display formatting only, (m,n) for floating point will round the number at n places, while in fixed point (m,n) n will round or truncate the number.


About the Author

Ronald Bradford provides Consulting and Advisory Services in Data Architecture, Performance and Scalability for MySQL Solutions. An IT industry professional for two decades with extensive database experience in MySQL, Oracle and Ingres his expertise covers data architecture, software development, migration, performance analysis and production system implementations. His knowledge from 10 years of consulting across many industry sectors, technologies and countries has provided unique insight into being able to provide solutions to problems. For more information Contact Ronald.

References

The minimum testing for a shared disk MySQL environment

Recently I was asked to provide guidelines for testing fail over of a MySQL configuration that was provided by a hosting provider.

The first observation was the client didn’t have any technical details from the hosting provider of what the moving parts were, and also didn’t have any confirmation other then I think a verbal confirmation that it had been testing.

The first rule in using hosting, never assume. Too many times I’ve seen details from a client stating for example H/W configuration, only to audit and find out otherwise. RAID is a big one, and is generally far more complex to determine. Even for companies with internal systems I’ve seen the most simple question go unanswered. Q: How do you know your RAID is fully operational? A: Somebody will tell us? It’s really amazing to investigate on site with the client to find that RAID system is running in a degraded mode due to a disk failure and nobody knew.

It took some more digging to realize the configuration in question was with Red Hat Cluster Suite. A word of warning for any clients that use this, DO NOT USE MyISAM. I’ll leave it to the readers to ask me why.

Here is a short list I provided as the minimum requirements I’d test just to ensure the configuration was operational.

Verifying a working Red Hat Cluster Suite MySQL Environment

The MySQL Environment

The database environment consists of two MySQL database servers, configured in an active/passive mode using a shared disk storage via SAN.
For the purposes of the following procedures the active server will be known as the ‘primary’ server, and the passive server will be the ‘secondary server’.
The two physical servers for the purposes of these tests will be defined as ‘alpha’ and ‘beta’, with specific H/W that does not change during these tests.

Normal Operations

Expected Configuration under normal operations.

Primary Server

  • server is pingable
  • server accepts SSH Connection
  • MySQL service is started
  • has /data appropriately mounted
  • has assigned VIP address
  • MySQL configuration file and settings are correct

Secondary Server

  • server is pingable
  • server accepts SSH Connection
  • MySQL service IS NOT started
  • DOES NOT have /data mounted
  • DOES NOT has assigned VIP address
  • MySQL configuration file is not available

1. Reboot servers ‘alpha’ and ‘beta’.

Test Status:

  • alpha server is the designated primary server
  • alpha and beta servers are operational

Action:
1.1 Restart alpha server (init 6)
1.2 Restart beta server (init 6)

Checklist:
1.3 Alpha server matches primary server configuration
1.4 Beta server matches secondary server configuration

2. Controlled fail over from ‘alpha’ to ‘beta’

Test Status:

  • alpha server is the designated primary server
  • alpha and beta servers are operational

Action:
2.1 Alpha server – Instigate Cluster failover (clusvcadm -r mysql-svc)

Checklist:
2.2 Beta server matches primary server configuration
2.3 Alpha server matches secondary server configuration

3. Controlled failover from ‘beta’ to ‘alpha’

Test Status:

  • beta server is the designated primary server
  • alpha and beta servers are operational

Action:
3.1 beta server – Instigate Cluster failover (clusvcadm -r mysql-svc)

Checklist:
3.2 Alpha server matches primary server configuration
3.3 Beta server matches secondary server configuration

Exception Operations

4. Loss of connectivity to primary server

Test Status:

  • alpha server is the designated primary server
  • beta server is online

Action:
4.1 Stop networking services on ‘alpha’ (ifdown bond0)

Checklist:
4.2 Monitoring detects and reports connectively loss
4.3 Automated failover occurs
4.4 Beta server matches primary server configuration
4.5 Alpha server matches secondary server configuration

5. Restore connectivity to secondary server

Test Status:

  • beta server is the designated primary server
  • alpha server is online, but not accessible via private IP

Action:
4.1 Start networking services on ‘alpha’ (ifup bond0)

Checklist:
5.2 Monitoring detects and reports connectively restored
5.3 No failback occurs
5.4 Beta server matches primary server configuration
5.5 Alpha server matches secondary server configuration

6. Loss of connectivity to secondary server

Test Status:

  • beta server is the designated primary server
  • alpha server is online

Action:
6.1 Stop networking services on ‘alpha’ (ifdown bond0)

Checklist:
6.2 Monitoring detects and reports connectively lost
6.3 No failback occurs
6.4 Beta server matches primary server configuration
6.5 Alpha server matches secondary server configuration

7. Restore connectivity to secondary server

Test Status:

  • beta server is the designated primary server
  • alpha server is online, but not accessible via private IP

Action:
7.1 Start networking services on ‘alpha’ (ifup bond0)

Checklist:
7.2 Monitoring detects and reports connectively restored
7.3 No failback occurs
7.4 Beta server matches primary server configuration
7.5 Alpha server matches secondary server configuration

8. Power down secondary server

Test Status:

  • beta server is the designated primary server
  • alpha server is online

Action:
8.1 Power down alpha (init 0) NOTE: Need remote boot capabilities

Checklist:
8.2 Monitoring detects and reports connectively lost
8.3 Beta server matches primary server configuration
8.4 Additional paging for extended down time for ‘degraded support for failover’

9. Loss of connectivity to primary server

Test Status:

  • beta server is the designated primary server
  • alpha server is offline

Action:
9.1 Power down beta (init 0) NOTE: Need remote boot capabilities

Checklist:
9.2 Monitoring detects and reports connectively lost
9.3 Site database connectively completely unavailable
9.4 Additional paging for loss of HA solution

10. power restored to secondary server
Test Status:

  • alpha server is offline
  • beta server is offline

Action:
10.1 Power on alpha

Checklist:
10.2 Monitoring detects and reports server up
10.3 Alpha server assumes primary role (previously it was beta)
10.4 Alpha server matching primary server configuration
10.5 Addition paging for degraded HA

11. power restored to secondary server

Test Status:

  • alpha server is primary server
  • beta server is offline

Action:
11.1 Power on beta

Checklist:
11.2 Monitoring detects and reports server up
11.3 Alpha server matching primary server configuration
11.4 Beta server matching secondary server configuration

Database Operations

12. MySQL services on primary server go offline

Test Status:

  • alpha server is the designated primary server
  • beta server is online

Action:
12.1 Stop mysql services on ‘alpha’ (/etc/init.d/mysqld stop)

Checklist:
12.2 Monitoring detects and reports database loss (while connectivity is still available)
12.3 Automated failover occurs
12.4 Beta server matches primary server configuration
12.5 Alpha server matches secondary server configuration

13. MySQL services on secondary server go offline

Test Status:

  • beta server is the designated primary server
  • alpha server is online

Action:
13.1 stop mysql services on ‘beta’ (/etc/init.d/mysqld stop)

Checklist:
13.2 Monitoring detects and reports database loss (while connectivity is still available)
13.3 Automated failover occurs
13.4 Alpha server matches primary server configuration
13.5 beta server matches secondary server configuration

14. Load Testing during failure

Test Status:

  • alpha server is the designated primary server
  • beta server is online

Action:
14.1 Agressive load testing against database server
14.2 MySQL killed without prejudice (killall -9 mysqld_safe mysql)

Checklist:
14.3 Monitoring detects and reports mysql service loss
14.4 Automated failover occurs
14.5 Beta server matches primary server configuration
14.6 Alpha server matches secondary server configuration
14.7 Beta mysql logs shows a forced MySQL Recovery in logs

15. Forced Recovery

Test Status:

  • alpha server is the designated primary server
  • beta server is online

Action:
15.1 Manual full database backup is done (in case recovery does not work). Hosting Provider not told of this.
15.2 Dummy new table/schema is created (used as verification point)
15.3 Database on alpha primary server is dropped
15.4 Hosting Provider is notified stating a full database recovery including Point In time to just before drop (no time given, only command that was run)

Checklist:
15.5 Site is marked as unavailable
15.6 Hosting Provider restore data from backup and recover to point in time
15.7 Confirmation that new table/schema is restored, and full schema is available
15.8 Site is made available
15.9 Record of time for full disaster is recorded

Conclusion

This is not an exhaustive test, in fact it is just a documented approach for consideration to show a client what the minimum testing should be. As no dry run actually occurred, there may be inaccuracies and additions necessary to this document when first executed. I would need access to an appropriate configuration in order to perform a level of testing to complete this document.


About the Author

Ronald Bradford provides Consulting and Advisory Services in Data Architecture, Performance and Scalability for MySQL Solutions. An IT industry professional for two decades with extensive database experience in MySQL, Oracle and Ingres his expertise covers data architecture, software development, migration, performance analysis and production system implementations. His knowledge from 10 years of consulting across many industry sectors, technologies and countries has provided unique insight into being able to provide solutions to problems. For more information Contact Ronald.

BIGINT v INT. Is there a big deal?

The answer is yes.

In this face off we have two numeric MySQL data types, both Integer. In fact MySQL has 9 different numeric data types for integer, fixed precision and floating point numbers, however we are just going to focus on two, BIGINT and INT. This design consideration is part of my recent presentation Top 20 Design Tips for Data Architects.

What is the difference?
We turn to the MySQL Reference Manual first, in 10.1.1. Overview of Numeric Types we see the following.


INT[(M)] [UNSIGNED] [ZEROFILL]

A normal-size integer. The signed range is -2147483648 to 2147483647. The unsigned range is 0 to 4294967295.

BIGINT[(M)] [UNSIGNED] [ZEROFILL]

A large integer. The signed range is -9223372036854775808 to 9223372036854775807. The unsigned range is 0 to 18446744073709551615.

Ok, well an INT can store a value to 2.1 Billion, and an a BIGINT can store a value to some larger number to 20 digits. That MySQL search didn’t help much with details, we have to dig deeper to find 10.2. Numeric Types in which we find that INT is a 4 byte integer, and a BIGINT is an 8 byte integer.

So what’s the big deal?

Quite a lot actually. Using INT rather then BIGINT can make a significant reduction in disk space. Just this one change alone can save you 10%-20% (depends on your particular situation). More significantly, when used as a primary key, and for foreign keys and indexes, reducing your index size could be 50%, and this will improve performance when these indexes are used.

My approach is this. Let’s just focus on primary keys and foreign keys to begin with. Are you going to store more then 2.1 Billion rows in your table? The answer should be no? Should you say yes, then you do have grand plans, but you are also failing to consider the ramifications of handling larger data sets (a topic for later discussion).

There are exceptions to this rule, if you do a huge number of inserts and deletes, then while you may not have 2.1 Billion rows, you may have done 2.1 Billion inserts. Again better design practices should be considered in this case.

The Test

As with everything, we need some evidence to stake the claim. Using the Sakila sample database.

We start with a simple intersection table, that has a high number of numeric only columns. This will show the best case situation.

We will create two tables, one with all BIGINT columns, and one with all INT columns and then compare the size. These tables are only small, but they show the proportion of savings of disk space.

CREATE TABLE inventory_bigint LIKE inventory;
ALTER TABLE inventory_bigint
  MODIFY inventory_id  BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  MODIFY film_id BIGINT UNSIGNED NOT NULL,
  MODIFY store_id BIGINT UNSIGNED NOT NULL;
INSERT INTO inventory_bigint SELECT * from inventory;
CREATE TABLE inventory_int LIKE inventory;
ALTER TABLE inventory_int
  MODIFY inventory_id  INT UNSIGNED NOT NULL AUTO_INCREMENT,
  MODIFY film_id INT UNSIGNED NOT NULL,
  MODIFY store_id INT UNSIGNED NOT NULL;
INSERT INTO inventory_int SELECT * from inventory;

select table_name,engine,row_format, table_rows, avg_row_length,
        (data_length+index_length)/1024/1024 as total_mb,
         (data_length)/1024/1024 as data_mb,
         (index_length)/1024/1024 as index_mb
from information_schema.tables
where table_schema='sakila'
and   table_name LIKE 'inventory%'
order by 6 desc;
+------------------+--------+------------+------------+----------------+-------------+-------------+-------------+
| table_name       | engine | row_format | table_rows | avg_row_length | total_mb    | data_mb     | index_mb    |
+------------------+--------+------------+------------+----------------+-------------+-------------+-------------+
| inventory_bigint | InnoDB | Compact    |     293655 |             51 | 43.60937500 | 14.51562500 | 29.09375000 |
| inventory_int    | InnoDB | Compact    |     293715 |             37 | 29.54687500 | 10.51562500 | 19.03125000 |
| inventory        | InnoDB | Compact    |     293707 |             33 | 22.54687500 |  9.51562500 | 13.03125000 |
+------------------+--------+------------+------------+----------------+-------------+-------------+-------------+
3 rows in set (0.15 sec)

In this example, the data portion decreased from 14MB to 10MB or 28%, and the index portion from 29M to 19M or 34%.

CREATE TABLE customer_bigint LIKE customer;
ALTER TABLE customer_bigint
     MODIFY customer_id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
     MODIFY store_id BIGINT UNSIGNED NOT NULL,
     MODIFY address_id BIGINT UNSIGNED NOT NULL,
     MODIFY active BIGINT UNSIGNED NOT NULL;

CREATE TABLE customer_int LIKE customer;
ALTER TABLE customer_int
     MODIFY customer_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
     MODIFY store_id INT UNSIGNED NOT NULL,
     MODIFY address_id INT UNSIGNED NOT NULL,
     MODIFY active INT UNSIGNED NOT NULL;

select table_name,engine,row_format, table_rows, avg_row_length,
        (data_length+index_length)/1024/1024 as total_mb,
         (data_length)/1024/1024 as data_mb,
         (index_length)/1024/1024 as index_mb
from information_schema.tables
where table_schema='sakila'
and   table_name LIKE 'customer%'
order by 6 desc;

+-----------------+--------+------------+------------+----------------+-------------+-------------+-------------+
| table_name      | engine | row_format | table_rows | avg_row_length | total_mb    | data_mb     | index_mb    |
+-----------------+--------+------------+------------+----------------+-------------+-------------+-------------+
| customer_bigint | InnoDB | Compact    |     154148 |            139 | 37.09375000 | 20.54687500 | 16.54687500 |
| customer_int    | InnoDB | Compact    |     151254 |            121 | 30.06250000 | 17.51562500 | 12.54687500 |
| customer        | InnoDB | Compact    |      37684 |            125 |  7.81250000 |  4.51562500 |  3.29687500 |
| customer_list   | NULL   | NULL       |       NULL |           NULL |        NULL |        NULL |        NULL |
+-----------------+--------+------------+------------+----------------+-------------+-------------+-------------+
4 rows in set (0.22 sec)

In this example, the data portion decreased from 20MB to 17MB or 15%, and the index portion from 16M to 12M or 25%.

NOTE: The sample data set was increased for this example.

Conclusion

Even with these simple tables and small data sets it’s clear that INT is a saving of diskspace over BIGINT. In many clients I’ve seen huge savings in multi TB databases, just with a small number of schema optimizations. If this saving alone for a more optimized database design was only 10%, it is an easy 10% that will reflect a direct improvement in performance.


About the Author

Ronald Bradford provides Consulting and Advisory Services in Data Architecture, Performance and Scalability for MySQL Solutions. An IT industry professional for two decades with extensive database experience in MySQL, Oracle and Ingres his expertise covers data architecture, software development, migration, performance analysis and production system implementations. His knowledge from 10 years of consulting across many industry sectors, technologies and countries has provided unique insight into being able to provide solutions to problems. For more information Contact Ronald.

References

Off to OSCON

I will be heading to my first OSCON next week where I will be presenting MySQL Proxy: from Architecture to Implementation in conjunction with Giuseppe Maxia .

As was written by Colin Charles Our booth is yours… Sun at OSCON, Sun/MySQL would appear to also have a reasonable turnout. So it will be good to see some old colleagues and friends, and hopefully meet some new contacts.

While I am based on the East Coast, I do also provide expert MySQL consulting for clients in any location. Should you like to find out more about my offerings covering Architecture, Performance, Scaling, Migration and Knowledge Transfer for MySQL Solutions, please Contact Me and I will arrange a time to meet next week.

Why SQL_MODE is important? Part I

MySQL pre version 5.0 was very lax in it’s management of valid data. It was easy for data integrity to be abused if you knew how. The most common examples were truncations and silent conversions that if not understood could provide a serious data integrity issue.

In version 5.0, the introduction of SQL_MODE solved this problem. We will look at one example of how SQL_MODE can be enabled to provided improved data integrity.

You want to store the individual RGB (red/green/blue) decimal values of colors in a table. Each of these has a range from 0 to 255. You read that you can store 255 values in a TINYINT Integer data type, so you create a table like:

DROP TABLE IF EXISTS color_to_decimal;
CREATE TABLE color_to_decimal(
name VARCHAR(20) NOT NULL PRIMARY KEY,
red    TINYINT NOT NULL,
green TINYINT NOT NULL,
blue   TINYINT NOT NULL);

You insert some data like:

INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('white',255,255,255);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('black',0,0,0);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('red',255,0,0);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('green',0,255,0);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('blue',0,0,255);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('yellow',255,255,0);

Great, but when you look at you data you get?

SELECT     name, red, green, blue
FROM       color_to_decimal
ORDER BY name;
+--------+-----+-------+------+
| name   | red | green | blue |
+--------+-----+-------+------+
| black  |   0 |     0 |    0 |
| blue   |   0 |     0 |  127 |
| green  |   0 |   127 |    0 |
| red    | 127 |     0 |    0 |
| white  | 127 |   127 |  127 |
| yellow | 127 |   127 |    0 |
+--------+-----+-------+------+
6 rows in set (0.01 sec)

What happened, you delete the data and re-insert only to find no changes. You have been the victim of a silent conversion, via a means of truncation.

The TINYINT data type is 1 byte (8 bits). 8 bits can store the values from 0 to 255. When you use this integer data type, only 7 bits are actually available, which gives the range of 0 to 127. Why? Because MySQL reserved one bit for the sign, either positive or negative, even though you didn’t want a sign.

So knowing this, you go back and recreate your table with the following definition.

DROP TABLE IF EXISTS color_to_decimal;
CREATE TABLE color_to_decimal(
name VARCHAR(20) NOT NULL PRIMARY KEY,
red    TINYINT UNSIGNED NOT NULL,
green TINYINT UNSIGNED NOT NULL,
blue   TINYINT UNSIGNED NOT NULL);

You load your data and look at it again, and you see.

+--------+-----+-------+------+
| name   | red | green | blue |
+--------+-----+-------+------+
| black  |   0 |     0 |    0 |
| blue   |   0 |     0 |  255 |
| green  |   0 |   255 |    0 |
| red    | 255 |     0 |    0 |
| white  | 255 |   255 |  255 |
| yellow | 255 |   255 |    0 |
+--------+-----+-------+------+
6 rows in set (0.00 sec)

But, should you have been told about this, should there have been an error. Well, in MySQL this is actually a warning, and most applications never support and cater for warnings. It is only when you use the MySQL client program, as in these examples, you are given an indication, with the following line after each insert. If you look closely.

mysql> INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('blue',0,0,255);
Query OK, 1 row affected, 1 warning (0.00 sec)


However there is a savior for this situation, and that is SQL_MODE.

When set to the setting TRADITIONAL, an error and not a warning is generated, and most applications support catching errors. Look at what happens in our example using the original table.

SET SQL_MODE=TRADITIONAL;
DROP TABLE IF EXISTS color_to_decimal;
CREATE TABLE color_to_decimal(
name VARCHAR(20) NOT NULL PRIMARY KEY,
red    TINYINT NOT NULL,
green TINYINT NOT NULL,
blue   TINYINT NOT NULL);
INSERT INTO color_to_decimal (name, red,green,blue) VALUES ('white',255,255,255);
ERROR 1264 (22003): Out of range value for column 'red' at row 1


As an added benefit you get this data integrity for free. We didn’t test it, because we know the data coming in is in the range of 0-255, but what if the user entered 500 for example. Let’s see.

TRUNCATE TABLE color_to_decimal;
SET SQL_MODE='';
INSERT INTO color_to_decimal (name, red, green, blue) VALUES('a bad color',500,0,0);
SELECT name, red, green, blue FROM color_to_decimal;
SET SQL_MODE=TRADITIONAL;
INSERT INTO color_to_decimal (name, red, green, blue) VALUES('a bad color',500,0,0);

Looking closely at the response in the client.

mysql> TRUNCATE TABLE color_to_decimal;
Query OK, 0 rows affected (0.02 sec)

mysql> SET SQL_MODE='';
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO color_to_decimal (name, red, green, blue) VALUES('a bad color',500,0,0);
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> SELECT name, red, green, blue FROM color_to_decimal;
+-------------+-----+-------+------+
| name        | red | green | blue |
+-------------+-----+-------+------+
| a bad color | 127 |     0 |    0 |
+-------------+-----+-------+------+
1 row in set (0.00 sec)

mysql> SET SQL_MODE=TRADITIONAL;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO color_to_decimal (name, red, green, blue) VALUES('a bad color',500,0,0);
ERROR 1264 (22003): Out of range value for column 'red' at row 1

As discussed in my presentation Top 20 design tips for Data Architects the UNSIGNED column construct should be always defined unless there is a specific reason not to.

In some respects I would argue that the default for an Integer column should be actually UNSIGNED, and that SIGNED should be specified when you want a sign. Most integer columns generally in schema’s only contain positive numbers. There are of course plenty of examples, positional geo data, financial data, medical data for example.

One could also argue that MySQL should make the default SQL_MODE at least TRADITIONAL, and that only when you want backward compatibility should you then change the SQL_MODE.

This is Part I on SQL_MODE, there are few more interesting cases to discuss at a later time.


About the Author

Ronald Bradford provides Consulting and Advisory Services in Data Architecture, Performance and Scalability for MySQL Solutions. An IT industry professional for two decades with extensive database experience in MySQL, Oracle and Ingres his expertise covers data architecture, software development, migration, performance analysis and production system implementations. His knowledge from 10 years of consulting across many industry sectors, technologies and countries has provided unique insight into being able to provide solutions to problems. For more information Contact Ronald.

References

Sun Stock Prices

Sun Microsystem’s (NASDAQ:JAVA) hit a low this week of $8.71. There was a stronger rally and a close at $9.16 today. The financial times reports Sun Micro chief sees rays of hope, and Bloomberg Sun Rises After Fourth-Quarter Profit Tops Estimates.

I cashed out in March at $16.32, so that’s like a 50% drop in share price. I was lucky having been at MySQL long enough to have options to vest. Newer employees are not that lucky. I certainly hope MySQL Sun Employees get the Q4 weighted bonuses. (A structure I didn’t believe compensated with the old bonus structure).

I have been following more closely since Matt Asay’s comments in Who is buying Sun?



Image courtesy of Google Financial’s.

Auditing your MySQL Data – Part 2

Continuing from my earlier post Auditing your MySQL Data, Roland has accurately highlighted that my initial post leaves out some important information for auditing. As the original charter was only to keep a history, for the purpose of comparing certain columns, a history was all that was needed.

Providing a history of changes forms the basis of auditing, and in keeping with my post title and intended follow-up, this is the all important second part. However in order to provide true auditing additional information is necessary. This includes:

  1. When was an operation performed
  2. What operation was performed, i.e. INSERT, UPDATE and DELETE
  3. Who performed the operation

Date and operation can be determined via the database, but in order to gather all this information, interaction with the application is necessary to obtain the true user information (This can’t be determined via a trigger)

The issue becomes a greater need for design understanding. What is the purpose of the audit data? How will it be accessed? How complex in maintaining the data do you wish to consider?

One alternative is keep a separate log of audit history. The benefits are a clear and easy way to provide a history of a users’ actions, and can preserve the structure of database table between the base and audit table can remain the same, triggers can remain relatively simply. However if you want to look at the data with audit history, it is better to embed these columns within each table, and triggers have to be customized and maintained in more detail the my original post.

When considering the progression of these points, the design process normally returns to the following conclusion. The following columns are added to the base table.

  • A create_timestamp column is added
  • A last_update_timestamp column is added
  • A last_update_user_id column is added

The create_timestamp is optional from an auditing perspective, because the last_update_timestamp of the first audit row will contain the same value, however experience has shown this column is valuable for other design considerations.

The only remaining issue is the type of operation, INSERT,UPDATE & DELETE. Both INSERT and UPDATE can be inferred, DELETE can not. To maintain the simplicity model, a common approach is to use a BEFORE DELETE trigger to insert an audit record with all the same values of the previous row, with the last_update_timestamp manually set. DELETE can then be determined via a no difference in any updated values.

It ultimate conclusion comes down your application design and needs. For example, your design for example may include a flag or row status for example to indicate deletes which are later cleaned up via a batch process so you don’t really care about the date/time of the actual purging of data. This then negates the need for any DELETE trigger.

Again, thanks to Roland for providing a link to Putting the MySQL information_schema to Use which provides a number of SQL statements that help in the generation of Triggers to support full auditing.

You should be aware that CURRENT_USER normally serves zero purpose if all changes are made via an application user.

At this time, you also have another design consideration. Do you introduce a procedure to re-create the triggers via an automated means for each schema change, or do you manually maintain triggers with schema changes. With each approach, additional checking and verification is necessary to ensure your triggers are correctly configured.

Auditing your MySQL Data

I was asked recently by a client to help with providing a history of data in certain tables. Like most problems, there is no one single solution, and in this case there are several possible solutions. I was able to provide a database specific only solution, with just minimal impact to the existing schema.

Here is my approach, your feedback and alternative input as always a welcome.

The problem

Client: I want to keep a history of all changes to two tables, and have a means of viewing this history.

For the purposes of this solution, we will use one table, called ‘customer’ from the Sakila Sample Database.

Solution

For tables to be audited, we will introduce a new column called ‘audit_id’ which is NULLABLE, and hopefully will not affect any existing INSERT statements providing column naming (a Best Practice) is used.
We do this to ensure that the Audit Table has both the same structure (number and ordering of columns), and can have a Primary Key defined.

Schema Preparation

mysql> USE sakila;
mysql> ALTER TABLE customer ADD audit_id INT UNSIGNED NULL;

We can then create an Audit Schema to store the Audit Table. This helps to ensure a clean schema and support for appropriate backup and recovery. Using a standard of suffixing existing schemas with ‘_audit';

mysql> CREATE DATABASE IF NOT EXISTS sakila_audit;
mysql> USE sakila_audit;
mysql> SET FOREIGN_KEY_CHECKS=0;   # (1)
mysql> CREATE TABLE customer LIKE sakila.customer;
mysql> ALTER TABLE customer DROP KEY `email`;   # (2)
mysql> # Foreign Keys (3)
mysql> ALTER TABLE customer
           DROP PRIMARY KEY,
           MODIFY customer_id SMALLINT UNSIGNED NOT NULL,  # (4)
           MODIFY audit_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY;

NOTES:
(1) Due to Foreign Key Constraints,
(2) We need to remove all existing UNIQUE Keys, I use an I_S SQL as shown below to find these programmatically, you can also view via SHOW CREATE TABLE customer;
(3) It appears that Foreign Keys are not created with the LIKE syntax. This may change in the future, so this note to check appropriately.
(4) We need to drop the primary key, but as this involves an AUTO_INCREMENT column we need to alter this as well, and this step involves naming the primary key column as well, which will vary per table.

We now have an Audit table in a separate schema that has the same columns. Part of the process of any new Schema Releases is to ensure these tables are kept in sync. An appropriate I_S statement could be used for verification. In this case, the support for a TRIGGER on Instance Startup to run, and throw an error to the error log or system error log (possible via UDF) would enable a balance check here.

Access to Audit Information

We provide a VIEW to the Audit table for history purposes. In addition, we use this for simplification of trigger management.

mysql> CREATE VIEW sakila.customer_history AS SELECT * FROM sakila_audit.customer;

NOTE: No index optimization has been performed on the Audit Table. It would be anticipated that existing Indexes could indeed be dropped and replaced with new indexes appropriate for data access.

Trigger Creation

In order to keep a copy of all data, we introduce two database triggers to manage a copy of all data in the history table. It is possible to say that History consists of the Current Version (in the base table) and all previous versions in the Audit Table. This requires 3 triggers. An alternative is to keep a full copy of all versions including current in the Audit Table. This requires 2 triggers, and takes more diskspace, however is a simpler and cleaner implementation.

USE sakila;
DELIMITER $$
DROP TRIGGER IF EXISTS customer_ari$$
CREATE TRIGGER customer_ari
AFTER INSERT ON customer
FOR EACH ROW
BEGIN
  INSERT INTO customer_history
  SELECT * FROM customer
  WHERE  customer_id = NEW.customer_id;
END;
$$
DROP TRIGGER IF EXISTS customer_aru$$
CREATE TRIGGER customer_aru
AFTER UPDATE ON customer
FOR EACH ROW
BEGIN
  INSERT INTO customer_history
  SELECT * FROM customer
  WHERE  customer_id = NEW.customer_id;
END;
$$
DELIMITER ;

NOTE: I do not generally like to use ‘SELECT *’ however in this situation, the trigger is significantly simplified. This is of benefit if you are maintaining audit triggers on many tables. The disadvantage is you must ensure your schema tables (e.g. sakila and sakila_audit) are always kept in sync with the same number and order of columns. Failing to add a column to the audit database will result in an error, which is a good confirmation. Failing to add a column in the right order, may corrupt your data. Exercise caution when modifying the schema in this situation.

Testing

As with any proper coding, we need to test this. The following sample SQL was run to test on a sample database.

SET FOREIGN_KEY_CHECKS=0;
USE sakila;
TRUNCATE TABLE sakila.customer;
TRUNCATE TABLE sakila_audit.customer;
SELECT 'no customer data', IF (count(*)=0,'OK','ERROR'),COUNT(*) AS VALUE from customer;
SELECT 'no customer history data', IF (count(*)=0,'OK','ERROR'),COUNT(*) AS VALUE from customer_history;
INSERT INTO customer (customer_id,store_id,first_name,last_name,email,address_id,active,create_date)
              VALUES(NULL,1,'mickey','mouse',',[email protected]',1,TRUE,NOW());
SELECT 'customer data = 1 row', IF (count(*)=1,'OK','ERROR'),COUNT(*) AS VALUE from customer;
SELECT 'customer history data = 1 row', IF (count(*)=1,'OK','ERROR'),COUNT(*) AS VALUE from customer_history;
INSERT INTO customer(customer_id,store_id,first_name,last_name,email,address_id,active,create_date)
              VALUES(NULL,1,'donald','duck',',[email protected]',1,TRUE,NOW());
SELECT 'customer data = 2 rows', IF (count(*)=2,'OK','ERROR'),COUNT(*) AS VALUE from customer;
SELECT 'customer history data = 2 rows', IF (count(*)=2,'OK','ERROR'),COUNT(*) AS VALUE from customer_history;
UPDATE customer SET email='[email protected]' where email='[email protected]';
SELECT 'customer data = 2 rows', IF (count(*)=2,'OK','ERROR'),COUNT(*) AS VALUE from customer;
SELECT 'customer history data = 3 rows', IF (count(*)=3,'OK','ERROR'),COUNT(*) AS VALUE from customer_history;
DELETE FROM customer where email='[email protected]';
SELECT 'customer data = 1 rows', IF (count(*)=1,'OK','ERROR'),COUNT(*) AS VALUE from customer;
SELECT 'customer history data = 3 rows', IF (count(*)=3,'OK','ERROR'),COUNT(*) AS VALUE from customer_history;
DELETE FROM customer;
SELECT 'customer data = 0 rows', IF (count(*)=0,'OK','ERROR'),COUNT(*) AS VALUE from customer;
SELECT 'customer history data = 3 rows', IF (count(*)=3,'OK','ERROR'),COUNT(*) AS VALUE from customer_history;

Improvements

It would be great if MySQL’s Procedural Language was a little more flexible and robust. Some improvements I’d love to see that would enable a more programmatic solution as the above contains a number of dependencies in schema_name and column_name.

  • Raise Error Handling to throw errors appropriately
  • Anonymous code block support, e.g. BEGIN …code… END; and an automatic execution, not the need to create a Procedure then execute it.
  • Ability to execute dynamic SQL more easily, for example CREATE DATABASE IF NOT EXISTS @variable; or CREATE VIEW @schema.@table_name_history FROM …
  • Support for multiple type (BEFORE|AFTER INSERT|UPDATE|DELETE) triggers per table.

INFORMATION_SCHEMA Query

mysql> SET @schema='sakila';
mysql> SET @table='customer';
mysql> SELECT DISTINCT CONCAT('ALTER TABLE ', table_name, ' DROP KEY `',constraint_name,'`;') AS cmd
           FROM INFORMATION_SCHEMA.table_constraints
           WHERE constraint_schema=@schema
           AND table_name=@table
           AND constraint_type='UNIQUE';

Getting Started with Simple DB

With my continued investigation of evaluating alternative data management with cloud computing options, I’m now evaluating Amazon Simple DB. Still in restricted beta, it helps to have a friend on the inside.

Working through the Getting Started Guide (API Version 2007-11-07) was ok, annoying in parts. Here are some issues I found. I was working with Java as the programming language.

  • The Docs enable you to view the language syntax in Java, C#, Perl, PHP, VB.NET, ScratchPad. You can also restrict the view to a specific language. A rather cool feature. One observation is there is no Python, which is rather ironic as my first investigation was Google App Engine (GAE), and the only language here is Python. Something I had to learn first.
  • Preparing the Samples asks you download the Amazon SimpleDB Sample code, but this is not actually a link to the sample code but an index to Community Code. I used Java Library for Amazon SimpleDB which wasn’t even on the first page of results.
  • The supplied docs for specifying the Classpath was rather wrong, helps to simply find all .jar files and included these. Mine looks like:
    • #!/bin/sh
      #  http://docs.amazonwebservices.com/AmazonSimpleDB/2007-11-07/GettingStartedGuide/?ref=get-started
      
      SDB_HOME="/put/directory/to/unzip/here"
      export CLASSPATH=$CLASSPATH:
      $SDB_HOME/src/com/amazonaws/sdb/samples/
      $SDB_HOME/lib/amazon-simpledb-2007-11-07-java-library.jar:
      $SDB_HOME/third-party/log4j-1.2.14/log4j-1.2.14.jar:
      $SDB_HOME/third-party/commons-codec-1.3/commons-codec-1.3.jar:
      $SDB_HOME/third-party/commons-logging-1.1/commons-logging-1.1.jar:
      $SDB_HOME/third-party/jaxb-ri-2.1/jaxb-xjc.jar:
      $SDB_HOME/third-party/jaxb-ri-2.1/activation.jar:
      $SDB_HOME/third-party/jaxb-ri-2.1/jaxb-impl.jar:
      $SDB_HOME/third-party/jaxb-ri-2.1/jaxb-api.jar:
      $SDB_HOME/third-party/jaxb-ri-2.1/jsr173_1.0_api.jar:
      $SDB_HOME/third-party/commons-httpclient-3.0.1/commons-httpclient-3.0.1.jar
      
  • All examples in the docs then refer to making changes such as “invokeCreateDomain(service, action); line and add the following lines after // @TODO: set action parameters here:”, problem is all the samples don’t have the action variable, but rather a variable called request. The comment in the code ” // @TODO: set request parameters here” is at least accurate.
  • The docs contain a lot of Java syntax that is would not for example compile correctly. Plenty of occurances of a missing semicolon ‘;’
  • Each example defines
    String accessKeyId = ““;
    String secretAccessKey = “
    “;
    Ok for the first example, but as soon as I moved to the second, I re-factored these into a Interface called Constants.
  • In all the examples, they never provide any sample output, this would help just to confirm stuff. In Java Library for Amazon SimpleDB download link, there is an example output, but it’s outdated, with new data attribute called BoxUsage. My output is:
    • CreateDomain Action Response
      =============================================================================
      
          CreateDomainResponse
              ResponseMetadata
                  RequestId
                      f04df8eb-71fa-4d4e-9bd5-cc98e853a2e4
                  BoxUsage
                      0.0055590278
      
  • And now some specifics. In a Relational database such as MySQL, you have Instance/Schema/Table/Column. Within SimpleDB it would indicate that you need a separate AWS account for Instance management. That’s probably a good thing as it will enable tracking of costs. There appears to be no concept of a Schema. Data is stored in Domains, this is the equivalent to a Table. Within each Domain, you specific Attributes, a correlation with Columns. One key difference is the ability to define a set of Attributes with the same identity (much like a list that is supported via Python/GAE). For any row of data, you must specify an itemName, this being equivalent to a Primary Key.These names table me back to old days (20 years ago) of Logical Data Models that used entities,attributes and relationships.
  • The term Replace is used when updating data for a given row.
  • When retrieving data, you first return a list of itemNames, then you can via Attributes for that given item.
  • You can perform a simple where qualification using a Query Expression, including against multiple Attributes via intersection syntax
  • A observation that is of significant concern is the lack of security against any type of operation. The Getting Started guide ends with Deleting the Domain. Is there no means to define permissions against type of users, such as an Application User, and a Database Administrator for managing the objects.

Well it took me longer to write this post, then to run through the example, but at least on a lazy Sunday afternoon, a first look at SimpleDB was quite simple.

I did also run into an error initially. I first started just via CLI under Linux (CentOS 5), but switched back to installing Eclipse on Mac OS/X for better error management, and of course this error didn’t occur.

Do you store credit cards in your MySQL Database?

The Payment Card Industry Data Security Standard (PCI DSS) has been developed to help organizations that process card payments to prevent credit card fraud, cracking and various other security vulnerabilities and threats.

This has been developed by the major credit card companies such as MasterCard and Visa. If one of the companies that created the standard, Mastercard International uses PCI General for MySQL then you would be confident that the software is of the highest quality to satisfy all requirements.

A few questions to consider.

Q: Why is PCI compliance important?
A: Credit Card companies will start to demand organizations that store credit card numbers have adequate security of their (as in the credit card company) data.

Q: How can I support PCI compliance with minimal impact?
A: Any solution that requires coding changes and then the necessary testing and verification will incur a large cost for a successful deployment. A turn key solution that can be implemented in a near seamless manner without code changes is ideal for any company.

PCI General for MySQL achieves this. With the security and encryption managed at the Operating System kernel level, MySQL data, and application communications is totally secure. Introducing PCI General via MySQL replication and with a controlled fail-over to a MySQL slave running under PCI General is the simplest and easiest method of introducing PCI Compliance into your production environment

For more information, please visit PCI General for MySQL or Contact Me and I’ll direct you to additional information.

Deleting from ARCHIVE tables

I can’t say I’ve used the ARCHIVE storage engine before, but at the NY MySQL Meetup last night there was discussion of the improvements to ARCHIVE in 5.1 and the fact that you could not DELETE from archive. A simple test confirmed this indeed throws an error.

DROP TABLE IF EXISTS url_log;
CREATE TABLE url_log(
log_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
log_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
user_id INT UNSIGNED NULL,
url VARCHAR(100) NOT NULL,
PRIMARY KEY (log_id))
ENGINE=ARCHIVE;

DELETE FROM url_log;
ERROR 1031 (HY000): Table storage engine for 'url_log' doesn't have this option

However, part of MySQL 5.1 which is RC status, there is partitioning. Thinking that one could probably partition say a log table by DAY OF MONTH, and then you could do what you want with the data in a partition and delete the partition, I tried the following test.
NOTE: for the purposes of testing, I used SECOND() rather then DAY() and smaller ranges for simplicity.

DROP TABLE IF EXISTS url_log;
CREATE TABLE url_log(
log_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
log_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
user_id INT UNSIGNED NULL,
url VARCHAR(100) NOT NULL,
PRIMARY KEY (log_id))
ENGINE=ARCHIVE
PARTITION BY RANGE ( SECOND(log_date) ) (
    PARTITION p0 VALUES LESS THAN (10),
    PARTITION p1 VALUES LESS THAN (20),
    PARTITION p2 VALUES LESS THAN (30),
    PARTITION p3 VALUES LESS THAN MAXVALUE
);

However this throws an error.

ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function

Primary keys, and AUTO_INCREMENT do not play well with partitioning, so for the purpose of this proof of concept, I’ll drop these.

DROP TABLE IF EXISTS url_log;
CREATE TABLE url_log(
log_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
user_id INT UNSIGNED NULL,
url VARCHAR(100) NOT NULL)
ENGINE=ARCHIVE
PARTITION BY RANGE ( SECOND(log_date) ) (
    PARTITION p0 VALUES LESS THAN (10),
    PARTITION p1 VALUES LESS THAN (20),
    PARTITION p2 VALUES LESS THAN (30),
    PARTITION p3 VALUES LESS THAN MAXVALUE
);

Create a simple Stored Procedure to randomly generate some data. The function is not efficient using RAND() and SLEEP() but it does provide the generation of some data.

DELIMITER $$
DROP PROCEDURE IF EXISTS load_url_log;
CREATE PROCEDURE load_url_log (insert_count INT)
BEGIN
  DECLARE i INT DEFAULT 1;

  WHILE i < insert_count
  DO
     INSERT INTO url_log(user_id, url)
     VALUES (FLOOR(RAND()*100), CONCAT(REPEAT('x',FLOOR(RAND()*99)),SLEEP(RAND())));
  END WHILE;
END $$

DELIMITER ;

CALL load_url_log(500);

I quick check shows a distribution of data.

mysql> select distinct(second(log_date)) from url_log;
mysql> select distinct(second(log_date)) from url_log where second(log_date) < 10;
+--------------------+
| (second(log_date)) |
+--------------------+
|                  0 |
|                  1 |
|                  2 |
|                  3 |
|                  4 |
|                  5 |
|                  6 |
|                  7 |
|                  8 |
|                  9 |
+--------------------+
10 rows in set (0.00 sec)

And now the purpose of the test. Deleting data via deleting a partition.

mysql> ALTER TABLE url_log DROP PARTITION p0;
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> select distinct(second(log_date)) from url_log where second(log_date) < 10;
Empty set (0.00 sec)

And it works. Re-creating however did not.

ALTER TABLE url_log ADD PARTITION (PARTITION p0 VALUES LESS THAN (10));
ERROR 1481 (HY000): MAXVALUE can only be used in last partition definition

Ok, so in my example I was lazy, I didn’t create specific partitions as I would in real world here, e.g. 31 partitions for DAYS. Simulate a little better.

DROP TABLE IF EXISTS url_log;
CREATE TABLE url_log(
log_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
user_id INT UNSIGNED NULL,
url VARCHAR(100) NOT NULL)
ENGINE=ARCHIVE
PARTITION BY RANGE ( SECOND(log_date) ) (
    PARTITION p0 VALUES LESS THAN (10),
    PARTITION p1 VALUES LESS THAN (20),
    PARTITION p2 VALUES LESS THAN (30),
    PARTITION p3 VALUES LESS THAN (60)
);
ALTER TABLE url_log DROP PARTITION p0;
ALTER TABLE url_log ADD PARTITION (PARTITION p0 VALUES LESS THAN (10));
ERROR 1493 (HY000): VALUES LESS THAN value must be strictly increasing for each partition


Still doesn’t work. RTFM indicates this.

DROP TABLE IF EXISTS url_log;
CREATE TABLE url_log(
log_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
user_id INT UNSIGNED NULL,
url VARCHAR(100) NOT NULL)
ENGINE=ARCHIVE
PARTITION BY LIST ( SECOND(log_date) ) (
    PARTITION p0 VALUES IN (0,1,2,3,4,5,6,7,8,9),
    PARTITION p1 VALUES IN (10,11,12,13,14,15,16,17,18,19),
    PARTITION p2 VALUES IN (20,21,22,23,24,25,26,27,28,29),
    PARTITION p3 VALUES IN (30,31,32,33,34,35,36,37,38,39),
    PARTITION p4 VALUES IN (40,41,42,43,44,45,46,47,48,49),
    PARTITION p5 VALUES IN (50,51,52,53,54,55,56,57,58,59)
);

ALTER TABLE url_log DROP PARTITION p0;
ALTER TABLE url_log ADD PARTITION (PARTITION p0 VALUES IN (0,1,2,3,4,5,6,7,8,9));

And that works.

So by creating an ARCHIVE table with 31 LIST partitions, one for each day of the month, you could use ARCHIVE to log data into DAY partitions, analyze, summarize,log,copy the data from the previous day, and purge it within 28 days.

Giving control of your data to the cloud

I’ve been doing some research and evaluation of more cloud computing. Specifically my focus has been on data store, and considering how to augment an existing operation using a popular database such as MySQL.

I’ve been looking first at Google App Engine and now I have my SimpleDB Beta today will be looking here next.

Some observations I’ve struggled with are:

  • No Native CLI, say for basic data setup. You can do some programmatic input for SELECT statements in a Query object in a SQL like syntax called GQL, but you can’t do DML
  • No simple data viewer. In production you would not do this, but I’m in evaluation and still looking at functionality, verification of results etc. A phpmyadmin clone is what I’m seeking for example. I suspect this would have been a good Google Summer of Code project.
  • Python only. While this was great for me to need to spend a 1/2 day to learn about Python syntax, it was just another small starting hurdle. If your organization doesn’t do or use Python, this is another skill or resource needed.

But the biggest concern and hurdle I’m understanding from my traditional principles is loss of control. Loss of control for monitoring and instrumentation, performance, availability and backup and recovery.

Recent issues in performance and unavailability have highlighted that App Engine is not suitable for mission critical web sites, not as your primary focus. I see huge benefits in augmenting access to information, perhaps more historical for example, a well define API within your application could easily support options to consider cloud storage as a secondary storage and primary retrieval of less important data.

My focus when revisiting here will be looking at means of object translate between tables and the Data Store and maybe an API for data transfer etc.

I suspect that when I get to evaluating EC2/S3 more I will have much more support by being able to leverage existing tools and techniques.

Handling Disaster 101

I’ve had to accept the “practice what you preach” pill recently due to a disaster at my hosting provider. See Learning from a Disaster.

While it was my own personal site on a dedicated server in question and not a business generating review I found that my MySQL Backup Strategy was incomplete ( It is also based on code 4 years old). I found that I had not tested my Disaster Recovery Plan adequately. I have used my backup and recovery approach in the past for various controlled situations and testing successfully.

So what mistakes did I make. There were two.

1. I was using a cold backup approach. That is specifically copying the entire MySQL Database in a controlled manner at the file system level. These were also copied to an alternate shared hosting server for storage. This works fine when you backup server supports a means of restoring data in this format, however if your backup shared hosting facility does not give you access to the MySQL data area, then this does not work. Not wanting to pay for two dedicated hosts this backup solution is impractical for my present hosting. Time to consider alternatives, such as being prepared with an EC2 image.

2. Recently I moved to using two MySQL instances, both 5.0 GA and 5.1 RC. The problem is I didn’t adjust my backup scripts appropriately to reflect two instances. Of course when my server was unavailable for 43 hours I was completely screwed in at least I could only throw an Site Unavailable page rather then my website. Combined with my hosting provider totally screwing DNS and admin access to manage DNS for 3-4 days more then almost 2 days of total server unavailability I also had to change support for my DNS. Would have been easier if I had a DNS DR plan.

The lesson is, no matter how trivial the information or website, if you actually want the content, make sure you test your recovery strategy ahead of a disaster.

Working with Google App Engine

Yesterday I took a more serious look at Google App Engine, I got a developer account some weeks ago.

After going though the getting started demo some time ago, I chose an idea for a FaceBook Application and started in true eXtreme Programming (XP) style (i.e. What’s the bare minimum required for first iteration). I taught myself some Python and within just a few minutes had some working data being randomly generated totally within the development SDK environment On my MacBook. I was not able to deploy initially via the big blue deploy button, the catch is you have to register the application manually online.

Then it all worked, and hey presto I’ve got my application up at provided domain hosting at appspot.com

Having coming from a truly relational environment, most notably MySQL of recent years I found the Datastore API different in a number of ways.

  • There is no means of Sequences/Auto Increment. There is an internal Unique Key, but it’s a String, not an integer, not enabling me to re-use it.
  • The ListProperty enables the use of Lists in Python (like Arrays) to be easily stored.
  • The ReferenceProperty is used as a foreign key relationship, and then can be more reference within an object hierarchy
  • I really missed an interactive interface. You have no abililty to look at your data, specifically for me I wanted to seek some data, then I wanted to delete some data, but I had to do all this via code.

Having developed a skelaton FaceBook application before in PHP, I figured a Python version would not be that much more work, but here is where I good stumped Information at Hosting a Facebook Application on Google AppEngine leveraging the PyFacebook project didn’t enable me to integrate Google App Engine with FaceBook just yet.

This had me thinking I need to resort to a standalone simply Python Facebook application to confirm the PyFacebook usage. Now my problems started. Under Mac it’s a lot more complex to install and configure Python/Django etc then under Linux. I tried to do it on my dedicated server, but drat Python is at 2.3.4, and it seems 2.5.x is needed.

Still it was a valuable exercise, I dropped the FaceBook goal and just worked on more Google App Engine stuff. Still early days, but it was productive to try out this new technology.

What I need to work on now is how to hold state within Python infrastructure so I can manage a user login and storing and retrieving user data for my sample app.

Corruption using MySQL AES_[EN|DE]NCRYPT functions

I was contacted this week by a previous client regarding a failure of processing data. This was a Contact, Financial and Media Management system I developed for a non-for-profit organization a LAMJ stack, and I’ve had to do nothing since deployment in the past 3 years, no bug fixes, no feature enhancements. The only thing lacking is additional management reporting, and data is extracted for this option now.

It runs under commodity Hardware, Linux and MySQL and it’s only unscheduled downtime was an power failure before UPS power was installed. However this all changed last week. Processing of regular scheduled encrypted data simply failed unexpectedly.

A summary of the environment.

  • Data is inserted with AES_ENCRYPT(data,key);
  • Data is retrieved with AES_DECRYPT(data,key);
  • Data is never updated.
  • New data is always added, and historical data always remains.
  • The application has no end user access to modify data.
  • The application has no function anywhere to modify the data, so no rouge happening could have occured.
  • An AUTO_INCREMENT column and TIMESTAMP columns provide a level of auditing for data forensics.
  • Backup copies of data exist up to 3 years for reference.
  • The seed key has not changed.

The problem

Selecting the first 10 rows saved in the table (By AUTO_INCREMENT Key and confirmed by dates), 8 of 10 are now corrupt. Select the last 10 rows inserted, zero are corrupt. Across 20,000 records 75% are now corrupt.

A lot of analysis has been performed to identify and track the data that was recorded, a certain amount of data forensics, and it was confirmed information was successfully processed last month for example. As this performs financial transactions, there is a lot more auditing available and being reviewed however it is simply a mystery that I can’t solve.

  • What options remain? is this a Hardware problem, Disk or even Memory.
  • What other data maybe corrupt?
  • How can more investigation occur to track the cause of the problem.
mysql> select version();
+------------------+
| version()        |
+------------------+
| 4.1.10a-standard |
+------------------+
1 row in set (0.00 sec)

Updated Website and Professional Blog

For those that have my existing blog bookmarked, or use any RSS/Atom feeds please update your information now.

My new blog can be found at http://ronaldbradford.com/blog/

RSS 2 is http://ronaldbradford.com/blog/feed/rss2, Atom is http://ronaldbradford.com/blog/feed/atom

Ok, so before you read on please change now. While all old links will redirect I encourage you to take a few moments in your browser and RSS reader.

So what’s new. Well as Baron stated in I moved this blog to pairLite with zero downtime, and it was easy I moved my old blog site blog.arabx.com.au and my temporary staging site ronaldbradford.com and transitioning my passworded beta site all with no downtime. Even with all my testing over several weeks, and several test users, my production migration had one hitch with some internal url re-directs not working correctly, but my overall site main links were not affected. You would have had to been quick at 1:30am on Saturday to catch it.

Other features is a consolidation of information from various sites now all under my brand ronaldbradford.com and allowing me now to expand on that. I’ve also split my professional blog from my personal views, photos and comments as well as new SEO/SEM optimizations. More on these experiences later. My theme was actually inspired by some business cards I had made in Australia while home last week, quality 300GSM color cards that were designed and printed in a just a few hours and are very good quality (which was very impressive).

I’m now free to work on so many other projects including many MySQL draft posts now Version 1.0 of my new site is out. As with all release early, release often sites, expect 1.1 soon!

Everything open source from Sun

In the recent interview Missed Twitter Questions from Jonathan Schwartz Interview at Web 2.0 Expo Sun CEO Jonathan Schwartz is quoted as saying “Everything Sun delivers will be freely available, via a free and open license (either GPL, LGPL or Mozilla/CDDL), to the community.”

Presently not everything is under one of these licenses. Java getting fully Open Source highlights that Java is getting there, but still contains closed source components. Open Solaris, NetBeans, Glassfish, Virtual Box and Open Office are. Even the mainline Solaris 10 is. Star Office is one that is not.

MySQL is also not there and presently has at least three different licenses. You have GPL for the MySQL Community Server. You have a subscription model for MySQL Enterprise which includes additional bug fixes not available in the Community Server. The subscription model also includes the MySQL Enterprise Monitor, software that will fail to operate if your subscription lapses. Additional upcoming MySQL Proxy features will also be subscription or have been termed “closed source” only. You also have the different model used by MySQL Workbench for licenses.

I know many that work on these products at MySQL do not agree with the various license policies. Indeed recent comments from Rich Green of Sun have also indicated we are not going to change anything at MySQL.

I was initially interested if Sun would move MySQL to the CDDL, however this question was recently raised and Marten Mickos stated there were no plans to move away from the GPL v2.

I can only hope that as Sun continues to promote itself as the largest open source company, these differences subside and disappear, and don’t continue to evolve and change.

Migrating my blog & updating WordPress

I’m migrating my existing WordPress run blog site at blog.arabx.com.au to a my new site ronaldbradford.com (which is not yet publically available)

As part of this process I’ll be doing a number of upgrades/changes including:

  1. Update blog software to 2.5.1 from 2.0.2 (I’d previously done a 2.0.2 upgrade to 2.3.2, but not deployed)
  2. Migrate to new domain
  3. Upgrade existing MySQL 5.1 version from 5.1.11 to 5.1.24
  4. Migrate database to using MySQL 5.1.24, from 5.0.22 (my server runs 5.0 and 5.1 instances)
  5. Split my blog into Professional & Personal

Upgrading
The upgrade is straightforward, backup database, download latest wordpress software. I run full revision to older versions via directories + symlinks so my installation is more complicated, but fully recoverable. Install, and run upgrade script. That all works, but my site breaks. Suspecting is my heavily customized them, by disabling that my site is up. One to add to the TODO list.

Migrate to new domain
Dump + Reload data into new schema. Copy WordPress install. I had to make two data changes to correctly use the new domain.


update wp_options set option_value='http://ronaldbradford.com/blog' where option_name='siteurl';
update wp_options set option_value='http://ronaldbradford.com/blog' where option_name='home';

Upgrade MySQL 5.1 version
That was also relatively straightforward. I was surprised I was running such an old 5.1.11, but I remember originally using 5.1.6 in production use before that.

Migrate database to 5.1 from 5.0
This is where my problems have begun. WordPress does not appear to like host+port stuff. I can confirm access via MySQL client. At 9:30pm Sunday night, this may have to remain in the unresolved bin for a few days.

Log Buffer #94: a Carnival of the Vanities for DBAs

April 25th, 2008 – by Ronald Bradford

Welcome to the 94th edition of Log Buffer, the weekly review of the database blogsphere. Adding to the list of usual database suspects, I have some more alternative considerations for our readers this week.

We start with Conferences

Still some discussion from last weeks’ 2008 MySQL Conference & Expo.

Baron “xarpb” Schwartz calls it correct in Like it or not, it is the MySQL Conference and Expo. Matt Assay of c|net gives us some of his opinions in three posts Two great posts on MySQL, Back to the future for MySQL and Between two consenting corporations… in followup to last week’s active slashdot discussion. Many others have also commented if you have not been following the news released before opening keynotes.

If you didn’t get a hard copy, Sheeri Kritzer Cabral has published the Pythian EXPLAIN Cheatsheet many attendees received.

Also last week was Collaborate 08 – Technology and Applications IOUG forum for the Oracle Community.

This week we also see the Web 2.0 San Francisco in action, and excitement is also brewing for the PGCon – PostgreSQL Conference for Users and Developers happening in under a month as Robert Treat has Plane tickets booked for PGCon. Postgres was also visible at the MySQL Conference & Expo if you were looking with a prominent consulting team downing the blue elephant during the event. Wish I’d taken a photo now!

Still more news from Adam Machanic of the Pythian group with SQLTeach Toronto: Almost Here.

Common threads

The 2008 Google Summer of Code announced this week showcases the Open Source databases MySQL (14 projects) and PostgreSQL (6 projects). Kaj Arnö talks more in Fourteen Summer of Code projects accepted 2008. The company PrimeBase Technologies also features strongly with two projects for the Blob Streaming storage engine for MySQL as I detail in Media Blob Streaming getting a Google boost.

MySQL

DTrace Integration with MySQL 5.0 – Chime demo in MySQL Users Conference 2008 by Jenny Chen is an example of Sun’s Open Source contribution to MySQL which I saw as a physical demo last week. Unfortunately, due to the imbalance in actually getting new functionality into Community contributions (actually non existence in current or next mysql version :-(), this functionality is only really for show. Dtrace with MySQL 6.0.5 – on a Mac describes some of this work actually making it into the next, next version. It seems this next Falcon Preview is available but not announced by MySQL generally as I note in Continued confusion in MySQL/Sun release policy.

MySQL Gurus Mark Callaghan and Brian Aker comment respectively here and here on MySQL Heap (Memory) Engine – Dynamic Row Format Support. Work submitted by Igor Chernyshev of eBay Kernel Team (whom I’ve met previously and was most impressed with his ability to submit MySQL patch work, with little previous MySQL kernel knowledge, but extensive C++ knowledge). This work also contributed to eBay Wins Application of the Year at MySQL Conference & Expo.

Mark also mentions in his post “How do users get it? There is no community branch into which people can submit changes with a GPL license.“. A topic your’s truly has also mentioned regarding the Community contributions, development and release. Perhaps a sign of more benefit to the community soon as Monty mentions.

Baron Schwartz comments on Keith “a.k.a Kevin” Murphy’s work in Spring 2008 issue of MySQL Magazine. With a quick plug also for his upcoming book “High Performance MySQL – Version 2″ (me giving it a plug also now), Baron also has the best published anti-spam sniffer email I’ve seen, and recently updated to his new employer. Check his blog and let me know.

Postgres

Joshus Drake of Command Prompt Inc. The Postgres Company gets excited in Is that performance I smell? Ext2 vs Ext3 on 50 spindles, testing for PostgreSQL and gives us some insight into different settings of two popular file system types. It would be great to see a follow up with a few more different filesystems types.

Pabloj “so many trails … so little time” extends his MySQL example to Postgres in Loading data from files. And on Postgres Online Journal, we get An Almost Idiot’s Guide to PostgreSQL YUM giving you a step by step guide of PostgreSQL setup, including the all important “Backing up Old Version”.

Oracle

We get a detailed book chapter from Keith Lake of Oracle OLAP The most powerful, open Analytic Engine in his extensive post on Tuning Guidance for OLAP 10g. David Litchfield brings attention in A New Class of Vulnerability in Oracle: Lateral SQL Injection. The title is sufficient for all Oracle DBA’s to review.

Don Seiler gives his experience in Bind Variables and Parallel Queries Do Not Mix when an Oracle Bug is discovered the database to 64-bit H/W..
Matching LOB Indexes and Segments by Michael McLaughlin gives us a good CASE/REGEX SQL example exam question, and simple output to monitor the growth of LOBs in your Oracle database.
Additional readings for Oracle folks can be found with Kenneth Downs writing Advanced Table Design: Resolutions and Dan Norris’ Collaborate 08 thoughts gives a concise review of a largely attended Oracle event.

SQL Server

B Esakkiappan’s SQL Thoughts gives us a throughout lesson on SQL Server 2005 Database Transaction logs with Know the Transaction LOG – Part – 1, Part – 2, Part – 3 and Part -4 Restoring Data.

Paul S. Randal of SQL Skills adds Conference Questions Pot-Pourri: How to create Agent alerts to his writings following many requests after a recent workshop.

In Scalability features I would like to have in SQL Server Michael Zilberstein lists 3 key features including “Active-Active cluster”, “Indexes per partition” and “Bitmap indexes and function based indexes”.

Ingres, Times Ten, Google App Engine and more

Some movement in the Ingres world with Deb Woods of Ingres Technology Blog discussing in Inside the Community – Ingres style…. the Ingres Engineering Summit occurring this week. Attendees included newbies to a 24 year Ingres veteran. That beats my experience in Ingres which now extends 19 years.

We get another very detailed installation description, this time for Times Ten in Install Oracle TimesTen In-Memory Database 7.0.4 on Linux.

Just a few weeks ago, a new database offering hit the market with the Google App Engine. News this week includes
Google App Engine Hack-a-thons! being announced with events in New York on May 7th and San Francisco on May 16th. As a developer with an account and an excuse to use it more, I can’t win, being in the right towns on the wrong dates.

OakLeaf Systems this week writes Comparing Google App Engine, Amazon SimpleDB and Microsoft SQL Server Data Services. Another good read just for comparison.

Not in a blog, but in discussion in at the recent MySQL, was msql. It was interesting to find out that PHP was originally developed for msql first, and only used MySQL as the preferred database after some functionality requirement. Interesting what could have been?

In Conclusion

Thanks Dave for the opportunity to contribute to the week in review. Until my chance to charm the readers next time.

I leave you with a photo, and challenge our readers to find another person who would be capable of wearing a t-shirt that states “My free software runs your company”. Michael Widenius- Founder and original developer of MySQL can, and my thanks to you for MySQL, and the Vodka shots at the Conference last week.

Happy Earth Day 2008!

Making business decisions for the community and the enterprise

I was prompted following a few key words by Marten Mickos at the Sun Dinner on Wednesday evening, and subsequent one on one discussion with Marten, to post my thoughts of some significant news this week announced at the MySQL Conference. The decision to provide as it’s been termed is “Enterprise only features”. It is unfortunate this was not discussed in Marten’s opening keynote, having been exposed the evening before in the Partner’s meeting and hitting the blog sphere before the conference officially started.

MySQL, past, present and future as an Open Source company requires a functional business model to succeed. This includes the funding of resources and the technology progression. It is also necessary in this business climate to build a successful business quickly. How do you do this? Well that’s probably the difference between a successful CEO and an unsuccessful one, and what Marten Mickos has produced is clearly very successful.

I may not necessarily agree with the decisions made, more specifically not understanding at this time the rationale of which features are free and which are likely to be commercial, but I respect the decision made by Marten Mickos. These new feature considerations are in a future release of MySQL, they are also I’m sure not yet set in stone, however MySQL can not be all things to all people, no software can. It reminds me of the Homer Simpson car, designed to do everything Homer wanted in a car, but it bankrupted the previously successful company due the views of one individual to solve all their own needs, but not the needs of the majority. Who is affected by this decision, who will benefit, again it’s too early to tell.

Monty indicates this is a MySQL decision, not a Sun decision. This indicates the transition of MySQL to being under the Sun banner of the largest open source company is well, still in transition. Today it was again confirmed to me, that the MySQL database will always be GPL and MySQL will not never revoke the functionality that powers the world’s largest websites as free software for the database server.

As an advocate for the MySQL community, I’d like to consider myself one of the pulses, a thought in the MySQL conscious and even a vocal lobbyist. I am however not interested in being a disruptor in the MySQL ecosystem. A Communications Lesson on Slashdot I believe correctly states “Sun to Begin Close Sourcing MySQL.” The headline is wrong.

A number of people have posted their comments, let’s stop bickering about it, and let’s see something positive happen for the benefit of the community. For example, as I’ve mentioned previously regarding the lack of differentiation for the Community version, again mentioned by Mark Callaghan in A better (community) HEAP engine where a worthwhile patch can’t be of benefit to the community in a binary release for the lay person.

When in management, I am responsible for contributing to the success of the company, to play a significant role in the functional business model, to ensure funding for resources and technology progression. In other terms, how can revenue generation be achieved to fund prominently salaries of staff, including my own. How am I going to do this? This will be the difference between my huge success or not.

Continued confusion in MySQL/Sun release policy

In review of some list posts today, I came across the Falcon Preview 6.0.5 downloads available from the MySQL Forge (even that is unclear, but the directory indicates this on the forge). The Forge Wiki Documentation indicates the 6.0.5 release features (without a download link), however the official MySQL downloads for 6.0, and the directory structure for all MySQL releases only describes 6.0.4.

Nothing in the MySQL News and Events of this new release. The documentation for 6.0.5 is also unclear as it lists this version as Not Yet Released.

This comes on the heels of “Pending General Availability/near-final release candidate” (See here and here), and the Press Releases for Sun Celebrates Third-Party MySQL Storage Engines only to be negated by Sun Microsystems Announces MySQL 5.1 which broke the plugins as they were not compatible.

I don’t wish to create confusion, but it’s unclear what is happening here, not withstanding the new preview only has 5 builds and excludes Mac OS/X which means I can’t natively test.

Media Blob Streaming getting a Google boost

The 2008 Google Summer of Code MySQL Projects are now available. MySQL has 14 listed projects, one of the ~190 different Open Source products listed. Unfortunately there is no summary to see the total number of projects being sponsored across all products.

Media Blob Streaming actually has the luxury of two approved projects, so they have plenty of mentoring work at PrimeBase Technologies.

Raj Kissu Rajandran will be working on BLOB Streaming Support for phpMyAdmin and KishoreKumar Bairi on Streaming Enabled MySQL Driver for PHP. Welcome to world of open source for your respective projects.

The database frontier

Jay’s opening lines regarding the final MySQL Conference keynote speaker was: “I work with a lot of data. I think peta-bytes, maybe exa-bytes”. This was relating to Jacek Becla from the Stanford Linear Accelerator Center, giving his presentation on “The Science and Fiction of Petascale Analytics”.

The goal of the Large Synoptics Survey Telescope (LSST) is the storage of 50+ PB of images and 20+ PB data.
Let’s just clarify the size. 20 PB of data = 20 years of HD Movies = 2000 years of 128kb MP3

The next database frontier is obviously building huge databases. What part will MySQL or other relational databases play? Some interesting facts were.

  • The Digital Universe Created 161 Exabytes of data last year.
  • Google, processes 20 petabytes of data per day.

The Operational plan for LSST Project Timeline is 10 years, only starting in 2014. The timeline:

  • 2009 Choosing Technology
  • 2010-2014 constructions
  • 2014-2023 production

The primary goals are: Scale, parallelize, fault tolerant.

Q: What a MySQL fellow does?

A: Maria, an ACID, MVCC engine that plans to be the default non-transactional and default transactional engine for MySQL.

Presently development with a team of 6 people and plans of adding 2-3 developers the work on Maria should see the 1.5 release this month.

It was great to here Monty say “We have a policy of zero MySQL Bugs, like the old MySQL way.”

Maria Version History
1.0 – “Crash Safe” — part of a existing 5.1 branch
1.5 – “Concurrent insert/select” to be merged as part of formal MySQL 6.0 release
2.0 – Transactional and ACID
3.0 – High Concurrency & Online Backup
4.0 – Data Warehousing

The schedule has all of the features to be available for the next MySQL Conference Q2 2009

Some points of note:

  • This is a MyISAM replacement.
  • It was interesting to hear about log file size (suggesting being big like 1G), and there are not circular. New log files will be created, and only files purged only when no longer used.
  • There has been a change in default page size, presently defaulting to 8K for both data and indexes.
  • Maria 1.5 does not support INSERT DELAYED and FullText and GIS indexes are not crash safe.
  • There are extensive tests in the MySQL Test Suite
  • Will support READ COMMITTED and REPEATABLE READ (available in 1.5)
  • Every part of the development and process is open an available documentation (unlike some other storage engines Monty mentions)
  • Have a drop everything policy on new bugs to have Maria as stable as possible.
  • Check out the blog at monty-says.blogspot.com

Tips from the MySQL Conference

What would be great if people could create a single line (one tip) from each talk and we could aggregate these for an executive summary for tech people.

This was prompted from only a few minutes looking in on Baron Shwartz’s EXPLAIN presentation. What I didn’t know was.

EXPLAIN EXTENDED SELECT …; SHOW WARNINGS; gives the rewritten SQL query

If only I had time to whip out an application on my Google AppEngine and get twitter feeds with say a mysqlconf keyword. Perhaps we need a all night BoF hackfest to do it.