Opinions, Expertise, Passion.

Information in black and white, and sometimes some color.

Jun
30

The iPhone

Link to this post

Apple today released the iPhone. It was released at 6pm on Friday June 29. Who releases a product at 6pm on a Friday? While walking up 5th Avenue before dinner I noticed a line of people outside a Cingular (now the new AT&T) store (the only phone network you can get an iPhone for).

Apparently people had been lining up at the New York Apple store for two days. Fools. So after dinner tonight at 11pm I headed up to the NY Apple store with Andrew and Jo, given they had never been there before. Lucky the store here is open 24 hours.

No lines, but inside it was the normal mayhem with some extra fanfare around the obvious new iPhones. Check out photo.

A few minutes later I got to play with one, really cool. Smaller then I expected, I had some difficulty understanding how to zoom in on web pages, I needed to learn the pinch and expand step. The keyboard is smaller then my Nokia E62 PDA so that was a little annoying.

You could even get a free box bag with a picture of an iPhone on it. Check it out.

Of course the box is free when you buy an iPhone. I got the 8GB model at $599 US, plus case plus tax set me back $681.82

Ok, so that’s nice, just had to wait the 1hr 15min to get home.

It’s now 1am, and the first requirement is to download the latest iTunes. Reboot my laptop into Windoze, and of course it’s like a 56MB file. While downloading I grabbed the image from my camera, upload the image to my Fotolog account, and started this blog. Download and install complete, and guess what, Windows has to restart your computer. There is just one more reason why I hate Windoze and simply don’t use it. Reboot later, and finally I get to start.

Activation of phone is via iTunes. Of course this doesn’t work. Please Call AT&T. This was to be expected as my phone is under an a corporate account. I did specifically check with the store before purchasing that I could use this account for the required 2 year contract commitment.

Of course the number presented on screen, and that I rang (1.877.419.4500) could not help me because I’m a business account. I have to call the Business Center on 1.866.907.3484. Ring that, and guess what I’m on hold, 20+ mins later still nothing. It’s now 2:14am, I hang up, I was on the phone waiting for 27 minutes. Rather disappointing, I guess that’s one reason why you start a sales promotion at 6pm on a Friday. All the phone support the Business Center have gone home for the weekend.

This had better turn out for the better in the morning. I’ll be rather peeved if I find out, I need some account information I simply don’t have then have to wait until Monday to call my office (of course wait until midday as that is 9AM Pacific time).

Right now failure, after such a great evening with friends, food and just excellent service it’s just one of those bummers in life.

Update 1. I’ve added video of the NY Apple Store iPhone activity.
Update 2. My phone is still not activated. Read about it at iPhone for corporate users. What a debacle.
Update 3 It’s alive iPhone activation - Finally after 3 days

Posted under Apple, Professional, iPhone on 30 Jun 2007
Comments (0)
Jun
29

MySQL Proxy. Playing with the tutorials

Link to this post

I was playing with the 5 sample tutorial Lua scripts available here with the MySQL Proxy, but I was doing something a little inefficiently.

I started mysqld, then I started the MySQL Proxy with the lua script, then connected to MySQL via the proxy. To test a different script I was actually killing the MySQL Proxy and restarting with appropriate script, but this is unnecessary. MySQL Proxy will re-read the lua script, as specified with –proxy-lua-script on new connection. All I need to do is copy in the file in question and get a new mysql client connection.

The tech version of the right way:

$ cp tutorial-basic.lua running.lua
$ ./mysql-proxy --proxy-lua-script=running.lua &
$ mysql -uusr -p -P4040 -h127.0.0.1
mysql> # do my stuff
mysql> exit;
$ cp tutorial-inject.lua running.lua
$ mysql -uusr -p -P4040 -h127.0.0.1

In the tutorial-resultset.lua example, we see the creation of a new command, show querycounter which gives you the number of queries executed for the connection. Immediately I can see that people will be creating pseudo Com_insert,Com_select,… variables per connection so you can get a better granularity of information then what default status variables can provide.

It was interesting however to see the counter be a value of 20 when you open a connection. What you may ask, but we forget the little things sometimes when running the mysql client. On invocation it actually executed 20 commands to the server, that is 20 round trips.

There are indeed: SHOW DATABASES; SHOW TABLES; then 16 Field Lists for each mysql table (e.g. host,db,func…), then a SELECT @@version_comment. Most of these as a result of the rehash or tab-completion capabilities in mysql client.

Wow, I also learned something new, there is a @@version_comment, I just knew about @@version. You learn some new trivial point every day!

 mysql> select @@version,@@version_comment;
+------------+------------------------------+
| @@version  | @@version_comment            |
+------------+------------------------------+
| 5.0.37-log | MySQL Community Server (GPL) |
+------------+------------------------------+
1 row in set (0.00 sec)

MySQL Proxy. Get it here.

Posted under Databases, MySQL, MySQL Proxy, Professional on 29 Jun 2007
Comments (1)
Jun
29

The algorithm is banned in China

Link to this post

This is an image I took yesterday of a billboard “The algorithm is banned in China.”. I don’t get it. This one has “Ask” in the bottom of the image.

I remember a few months ago seeing “The algorithm killed Jeeves”, and it had no reference to “Ask” on it, so I assumed it was some reference to Google Killing askjeeves.com. Need to find that photo.

Found it. I took this photo below on Apr 20, 2007. Note: there is no Ask logo like the one above.

Posted under Google, Professional on 29 Jun 2007
Comments (0)
Jun
29

It doesn’t take long

Link to this post

Google Van is one of the sites now sporting images found as part of Google Street View which has been on line for about a month now.

What is the world coming to!

Posted under Google, Professional on 29 Jun 2007
Comments (0)
Jun
29

What’s your disk I/O thoughtput?

Link to this post

MySQL uses disk. Ok, so everybody knew that. MySQL uses disk in two primary ways.

  • Random I/O (Reading & Writing Data/Index blocks)
  • Sequential I/O (Binary Log, InnoDB Redo Log)

Historically it’s been best practice to separate these onto different spindles, and also separating the OS and tmp space onto a third spindle. With commodity H/W that can be easily done, but today a lot of people use SAN. Is this a good thing for a MySQL Database?
That’s a topic of much discussion at a later time, however I’ll add two points. A lot of SAN configurations are RAID 5, and RAID 10 is a better choice due to removing the requirement to calculate the parity. Second, last week I observed a RAID disk failure and it took an incredible long time for the disk to be re-built. Just how many SAN uses our there have actually timed a disk rebuild on a loaded system and seen the impact on the system in general.

Back on topic, I don’t have access to any variety of hardware, so community here is where you can help. Those that can spare 5 mins, and have some free disk space (< 5GB), here is an exercise.

Commands

$ time dd if=/dev/zero count=100000 of=testfile.32k bs=32k
$ ls -lh testfile.32k
$ rm testfile.32k

You should see something like (FYI: from a 5400rpm laptop drive)


$ time dd if=/dev/zero count=100000 of=testfile.32k bs=32k
100000+0 records in
100000+0 records out
3276800000 bytes (3.3 GB) copied, 160.172 seconds, 20.5 MB/s
real 2m40.342s
user 0m0.120s
sys 0m15.277s
$ ls -lh testfile.32k
-rw-r--r-- 1 usr grp 3.1G 2007-06-28 10:02 testfile.32k

If your output doesn’t provide the dd M/B output (like Solaris for example) if you could also add:

$bc
3.2*1024/160
20
^D

NOTE: Replace 160 with the number of seconds from the real time (e.g. 2*60+40)

Of course I’m not collecting a lot of stuff, like comparing different block sizes, or looking at iostat for existing load and introduced load. I thought I’d ask an easy question to see what type of response and output I’d find.

If you want to add any additional information such as Drive Types & Speeds (e.g. SATA 5400rpm), RAID configuration or SAN configuration (e.g. RAID and connection type), OS and File System type that would be great, but completely optional.

If you would like add your results anonymously, please email me directly.

Thanks in advance.

Posted under Databases, MySQL, Professional on 29 Jun 2007
Comments (8)
Jun
28

Top 5 Best Practices for MySQL

Link to this post

We had the Top 5 wishes for MySQL started by Jay recently. So in true chain letter fashion I’m starting a new one this week. “The top 5 Best Practices for MySQL”. This like it’s predecessor is generally vague, so it can include points on development, design, administration etc.

My list:

1. Write your application to support Transactions (and therefore use a Transactional Storage Engine).

2. Always use SQL_MODE. e.g. at least TRADITIONAL and ANSI to ensure better data integrity and errors as errors.

3. Use the most optimal data types (particularly for number (e.g. TINY/SMALL/BIG INT and nullability) and especially in relation to columns in indexes.

4. When using InnoDB use the shortest primary key possible (e.g. INT UNSIGNED. BIGINT unless you have more then 4 billion rows in your potential data set is laziness).

5. VARCHAR(255) is just plain dumb and lazy. This is not database design and for the record, yes there is an impact when your queries use certain buffers (e.g. sort_buffer). Last year I wrote on this topic in If you don’t know your data, you don’t know your application.. Combined with SELECT * FROM TABLE in queries is not a well designed application.

Obviously I need to clarify that this is a baseline for Best Practice and many considerations can lead to a more optimized means depending on circumstances, for example using MyISAM or other MySQL non-transactional storage engines etc, when not to use sql_mode etc and when the shortest Innodb is not the best when you are being killed by I/O. For points 3 and 5, there are no exceptions.

To all Planet MySQL bloggers and readers, the challenge is on.

Posted under Databases, MySQL, Professional on 28 Jun 2007
Comments (0)
Jun
27

Got MySQL Proxy yet!

Link to this post

If you haven’t got MySQL Proxy yet, then stop and get it now. Jan announced the release a few days ago of this new product offering from MySQL.

I first heard about MySQL Proxy at the recent MySQL Conference 2007 and actually used it a few weeks later to help address slow running queries during benchmarking with a granularity of milliseconds — Wow. The product has grown immensely since then and I’ve watched in true amazement at the speed of development by Jan, who I only found out recently was the creator of Lighthttpd.

Now it’s past midnight and I think of those things I’ve wanted, can the proxy now do them. Here are a few ideas for me to play with.

  • Find Slow Queries < 1 second
  • Add a time (HH:MM:SS) column to SHOW GLOBAL STATUS output, and really hope it comes out in mysqladmin extended-status. Cross fingers here
  • Be able to log slow queries to a Memory table, a little like mysql.slow_log but more intelligent, and in 5.0.
  • True SQL access to things like PROCESSLIST, SHOW GLOBAL VARIABLES and SHOW GLOBAL STATUS, again like the 5.1 I_S tables, but I want them now in 5.0

The mind boggles. As Giuseppe describes it’s also an excuse to learn a new language called Lua. Those familiar with the MySQL GUI Products and have played with the GRT would have already seen Lua in action, as early as the 2006 User Conference.

Woot! Time to hit post and check out the real www.woot.com

Posted under Databases, MySQL, MySQL Proxy, Professional on 27 Jun 2007
Comments (0)
Jun
27

Query Cache is deterministic

Link to this post

I was reading some points on a Blog post today here. I didn’t see it via Planet MySQL but via a Google Alert.

The post has several good beginner points on MySQL Performance, but included the point “Literal current date is better than using the CURRENT_DATE() function because literals will use the query cache but functions won’t.”

How true, I must admit I’d not considered this in reviewing code for sites that use the Query Cache heavily. Non deterministic functions invalid use of the Query Cache (Except for recent Bug #29053 which is also an interesting read. Note to self, read the new bugs more often when free time permits). Back on point.

So if you have a query like “Select the news for today” and use CURDATE(), it will not be stored in the Query Cache, yet if your application code determines the current date and passes “Select the news for 2007-06-26″ for example it will.

Sometimes the simple things are easy to forget, or not have thought about in this case for me.

Posted under Databases, MySQL, Professional on 27 Jun 2007
Comments (0)
Jun
22

My ‘Hourly’ MySQL Monitor Script Version 0.05

Link to this post

I’ve been able to steal some more time to work on my script following My ‘hourly’ MySQL monitor script Version 0.03 almost a month ago.

The purpose of this script is simple. Create an hourly ‘cron’ job that records and monitors information against the OS and MySQL Server. This is most helpful when environments simply don’t have any monitoring in place. I’ve found it very productive when running a benchmark on site to simply enable for an hour. An amount of analysis is required, but I at least have a baseline of data collection. That is the first goal.

So hourly.0.05.tar.gz is taking some shape, and has the following new features:

  • Provide configurable flag to enable/disable OS and MySQL tests
  • Added MYSQL_SID support, allowing for running MySQL tests against multiple instances on a single server
  • Added configurable mysql.conf support, again supporting multiple server instances
  • Revised file name standards, again to support multiple server instances
  • Added additional error checking for correct MySQL Configuration and operational environment
  • Corrected file pathing of dependent files

My next step now is to provide some simple analysis, I’ll be starting with Statpack Version 2 by colleague Mark Leith.

Posted under Databases, MySQL, Professional on 22 Jun 2007
Comments (0)
Jun
21

Watching for Disk Based Tables being created

Link to this post

I didn’t know you could actually do this before addressing this problem in a benchmark using 5.0.36.

MySQL allows you to update two tables with a single UPDATE statement. I knew you could reference two tables in an UPDATE statement but not update both. However when working with a client after benchmarking I observed a large number of Created_tmp_disk_tables via SHOW GLOBAL STATUS and found that this query was the offending query only for certain circumstances. Understanding took a little longer.

schema.sql

DROP TABLE IF EXISTS a;
CREATE TABLE a(
id  INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
i   INT UNSIGNED NOT NULL,
c   CHAR(1) NOT NULL)
ENGINE=INNODB;
DROP TABLE IF EXISTS b;
CREATE TABLE b(
id  INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
i   INT UNSIGNED NOT NULL,
c   CHAR(1) NOT NULL,
t   TEXT NOT NULL)
ENGINE=INNODB;

data.sql

INSERT INTO a VALUES(1,1,'a');
INSERT INTO a VALUES(2,2,'b');
INSERT INTO a VALUES(3,3,'c');
INSERT INTO a VALUES(4,4,'d');
INSERT INTO b VALUES(1,1,'a',REPEAT('The quick brown fox jumped over the lazy dog',100));
INSERT INTO b VALUES(2,2,'b',REPEAT('The quick brown fox jumped over the lazy dog',200));
INSERT INTO b VALUES(3,3,'c',REPEAT('The quick brown fox jumped over the lazy dog',300));
INSERT INTO b VALUES(4,4,'d',REPEAT('The quick brown fox jumped over the lazy dog',400));

test.sql

SHOW SESSION STATUS LIKE '%tmp%tables';
SHOW SESSION STATUS LIKE '%tmp%tables';
UPDATE a,b SET a.i = 10, a.c = 'x', b.i = 20 WHERE a.id = b.id and a.id=1;
SHOW SESSION STATUS LIKE '%tmp%tables';
UPDATE a,b SET a.i = 10, a.c = 'x', b.i = 20,b.t=REPEAT('b',5000) WHERE a.id = b.id and a.id=3;
SHOW SESSION STATUS LIKE '%tmp%tables';
UPDATE b SET t = REPEAT('a',5000) WHERE b.id=2;
SHOW SESSION STATUS LIKE '%tmp%tables';

Output

The following output shows the following.

  1. Impact of a SHOW SESSION STATUS increments Created_tmp_tables
  2. A two table update increments Created_tmp_tables
  3. A two table update including a text field increments Created_tmp_tables and Created_tmp_disk_tables
  4. A single table update with the text field increments Created_tmp_tables

So the culprit was the combination of a two table update and a TEXT field.

mysql> source test.sql
--------------
SHOW SESSION STATUS LIKE '%tmp%tables'
--------------

+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 0     |
| Created_tmp_tables      | 3     |
+-------------------------+-------+
2 rows in set (0.00 sec)

--------------
SHOW SESSION STATUS LIKE '%tmp%tables'
--------------

+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 0     |
| Created_tmp_tables      | 4     |
+-------------------------+-------+
2 rows in set (0.00 sec)

--------------
UPDATE a,b SET a.i = 10, a.c = 'x', b.i = 20 WHERE a.id = b.id and a.id=1
--------------

Query OK, 2 rows affected (0.02 sec)
Rows matched: 2  Changed: 2  Warnings: 0

--------------
SHOW SESSION STATUS LIKE '%tmp%tables'
--------------

+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 0     |
| Created_tmp_tables      | 6     |
+-------------------------+-------+
2 rows in set (0.00 sec)

--------------
UPDATE a,b SET a.i = 10, a.c = 'x', b.i = 20,b.t=REPEAT('b',5000) WHERE a.id = b.id and a.id=3
--------------

Query OK, 2 rows affected (0.00 sec)
Rows matched: 2  Changed: 2  Warnings: 0

--------------
SHOW SESSION STATUS LIKE '%tmp%tables'
--------------

+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 1     |
| Created_tmp_tables      | 8     |
+-------------------------+-------+
2 rows in set (0.00 sec)

--------------
UPDATE b SET t = REPEAT('a',5000) WHERE b.id=2
--------------

Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

--------------
SHOW SESSION STATUS LIKE '%tmp%tables'
--------------

+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 1     |
| Created_tmp_tables      | 9     |
+-------------------------+-------+
2 rows in set (0.00 sec)
Posted under Databases, MySQL, Professional on 21 Jun 2007
Comments (0)
Jun
21

The Facebook effect

Link to this post

First it was the SlashDot effect, then it was the Digg effect, now it’s the Facebook effect. I have a friend at Facebook and he was talking about the effect of the Facebook Platform API that was released a few weeks ago. Sites were now struggling to cope with the effect of massive amounts of new traffic, ensuring that experienced MySQL Consultants will have plenty of scale out opportunities.

Here is an abstract from an article I read recently. Analyzing the Facebook Platform, three weeks in


Translation: unless you already have, or are prepared to quickly procure, a 100-500+ server infrastructure and everything associated with it — networking gear, storage gear, ISP interconnetions, monitoring systems, firewalls, load balancers, provisioning systems, etc. — and a killer operations team, launching a successful Facebook application may well be a self-defeating proposition.

This is a “success kills” scenario — the good news is you’re successful, the bad news is you’re flat on your back from what amounts to a self-inflicted denial of service attack, unless you have the money and time and knowledge to tackle the resulting scale challenges.

This comes from the success of iLike. Some more reading references are Crazy love when startup iLike hits pay dirt and Holy cow… 6mm users and growing 300k/day!. Wow!

You can’t buy viral marketing with this type of traffic growth figures. Given the current MySQL 12 Days of Scale-out and recent experiences where clients are seeking HA & scale-out solutions but have not architectured present systems to manage any level of scale-out via the MySQL proven techniques of replication and sharding.

It’s an important lesson that any organization wanting to develop a successful web site needs to ensure the architecture is designed with massive scale-out in mind from the beginning. This means starting with your application supporting partitioning of your data (both vertically and horizontally), and supporting replication, including the possibility of lag with MySQL Replication slaves.

You see on a lot of larger Web 2.0 sites these days after saving data, a message like “Your information will be available momentarily” or a message of this nature and the data saved not automatically displayed. This is a clear means of supporting lag, even if only for a few milliseconds. This is just the first of many steps in application design managing scale-out architectures.

Posted under Databases, MySQL, Professional on 21 Jun 2007
Comments (0)
Jun
21

Top 5 wishes for MySQL

Link to this post

Note: My views are just that: mine.

1. Real time Query Monitoring

MySQL 5.0 GA provides only 3 ways to look at queries that are executed on a server in some way or another. Slow Query Log, General Query Log and Binary Log. All require a server reboot to activate and de-activate. In a production system, it’s sometimes critical to be able to know “what is going on”, and you simply can’t reboot the server twice (once to turn on, once to turn off). 5.1 goes some way with Log Tables to being able to turn on General and Slow Logs into tables. Question is, as Kevin Burton listed in his points, when is 5.1 going to be out.

Real time query monitoring also needs to have a granularity of operation better then “server”. There needs to be a capacity to assign this on per connection basis. A server is being hammered, certain status variables are increasing greatly, I need to know now what queries are causing this. MySQL provides no means of doing this. MySQL Proxy is a great new idea/project people will start hearing more about, and it goes a long way to helping, but it’s not dynamic in that I can simply turn on logging on a production system without impact to the MySQL server or connections.

Real time query monitoring granularity of time, also need to be in better units, it’s very difficult to find slow running queries > 100ms when the present granularity is seconds. MySQL Proxy as mentioned, and also Connector/J provide this, (BTW Connector/J has excellent features in it’s many connection options if you develop with Java but it’s yet another output to look at, and when your application server and database servers are on different machine architectures it’s a lot of work to sync).

I am also behind SHOW PROFILE. I’d like to see it being able to be attached to existing connections, and applied to queries, and then output discarded for a time base condition (say < 100ms). Granted the act of observation slows things done, it the ability to be able to observe, see and use the information that’s around in bits that would be a start.

2. Consistent Release Cycles

As Stewart pointed out, it’s ridiculous to have 2 years for release of a product. A consistent cycle is needed. 5.0 Change Notes shows first release to GA 22 months. 5.1 Change Notes still not GA is at 19 months.
We talk now about getting new features, 5.1 is frozen, 5.2 seems lost now in any discussions. 6.0 has a few key features but I’m sure significant new features will be limited to ensure the exposure of Falcon. So, a key new feature maybe in 6.1+

Scope creep, lack of clear planning, test coverage, and user community contribution I feel are all factors. I know that user community testing and contribution is continuing to increase and I applaud the valuable contribution of the community. I wish I could do more myself.

3. Information Schema Extensions

I’ve heard of a Pluggable API for I_S tables in 5.1 Could somebody really confirm? I’ve seen Google doing File System Storage Engines (e.g. for /proc info). I really, really, really wish things like 5.1 processlist and status/variables tables were backported to 5.0 to start with.

In addition, now that I have started, there is need for more detailed information on queries, extensions of status variables that are needed. SHOW PROFILE goes someway internally to indicate what’s going on, but knowing that a certain buffer is being used, and what portion of it per connection will help in sizing. It’s important for example to know the tmp_table_size actually used in a result set, VARCHAR and COLLATION have a huge effect that people simply don’t consider. As the number of Pluggable storage engines increases, the need to know what is really happening is going to be more important. Some of this may be more in relation to Real Time Query Monitoring, but I feel certain additional information is needed to be stored.

4. Online Table Maintenance

It was not until I had to time operations recently for ALTER TABLE ADD COLUMN|INDEX did I realize the extent of the time it takes for InnoDB tables (i.e. is takes your database table offline during this time for any OLTP). My tests were taking over an hour (and I was not in the 3 digit GB range for a table). A real uptime system can’t support downtime like this. Traditionally large scale out MySQL applications have been developed around this limitation, however to compete more with Enterprise experiences, and resources coming from enterprise background this is simply not an option. Add the fact you can’t add a datafile to a Innodb Tablespace online (why not!). While speaking of datafiles, I echo Frank’s comment with the limitation of when using innodb_file_per_table, you can’t copy the file between MySQL Instances (assuming for example all the same version of H/W, O/S, MySQL).

I really hope that Falcon addresses these issues to provide a transactional storage engine offering with these enterprise uptime features.

5. Published Benchmarks

MySQL does not publish any benchmarks, well at least not what I know about. The first problem is: how long is a piece of string. There are millions of variables, but it would be great if even a number of cases of straight forward cases were proven. People may then have a better indication of baseline systems.

Here is my initial wish list.

  • Classification of Server Configuration. Let’s say a comparison between 1 CPU (dual Core), 2 CPU (dual Core) and 4 CPU (dual Core) commodity H/W. With the same memory (4G), disk (local disk), O/S (64 bit to support > 4GB), sample data (20-50GB) and same queries (simple OLTP only) just what are the benefits. Can we get a cost of throughput to $ Cost.
  • Disk Configuration. Just how does local Raid 1/Raid 5/Raid 10, compare with SAN (Raid 5) and SAN (Raid 10). Ok there are many types of disks as well as Raid, but start with commodity SATA 72K, 8MB cache. In addition how does a machine with 6 drives (in Raid 1 with OS,Data,Logs split) compare with Raid 5 or Raid 10.
  • RAM. How do our tests run when we take a 4GB system with 20-50GB of data and give it now 16GB.
  • Backup/Recovery. How long does it take to backup and restore.
  • Admin, how long does it take to ALTER TABLE, add data file, even something simply like how long to load data into a memory table across H/W has been interesting

As you can see it turns into a nightmare quickly, we didn’t talk about storage engines like MyISAM/InnoDB, tuning parameters, different O/S etc but surely something really is better then nothing. If only there was a baseline of data and queries to start with. Surely with the data sources available out there, some enterprising person could create a 20GB, 50GB + realistic production type data source, and 20-50 OLTP queries and we have a baseline.

There is talk of the Build Farm by Jay for compiling, let’s get that baseline so we can run some tests across thousands of configurations.
If sufficient work was done by MySQL to get some standard start, then the community might take up the challenge of taking the data/queries/benchmarking framework and test on all the configurations out there, tune to the wazoo and provide back to the MySQL Forge data for everybody to look at.

One day, knowing that this type of disk with this type of battery backup in this RAID configuration just isn’t anywhere near as good as 3 other types of options at roughly the same price.

The Rest

There are more, but in keeping with the spirit of 5 knowing that at least 2 people have shown scope creep already I’ll stop. I really want to mention more.

More About Top 5

Jay Pipes started the Top 5 wishes for MySQL recently. Here are the Planet MySQL contributors to date.

Jay Pipes
Marten Mickos - MySQL CEO
Stewart Smith
Kevin Burton
Farhan Mashaqi
Jeremy Cole

It’s almost like a chain letter, so I’ll start it by passing it onto 3 more, my challenge is to: — Mike Kruckenberg, my evil(he isn’t really evil) twin - Roland Bouman & Paul McCullagh.

27 June 2007 Update
Since my posting we have also had:
Antony Curtis
Alan Kasindorf
Jim Winstead
Jonathon Coombes

Posted under Databases, MySQL, MySQL Proxy, Professional on 21 Jun 2007
Comments (0)
Jun
15

MySQL - Wikipedia

Link to this post


I was reading only last week the notes from Wikipedia: Site Internals, Configuration and Code Examples, and Management Issues Tutorial by Domas Mituzas at the recent 2007 MySQL Conference. I didn’t attend this session, like a lot of sessions too much good stuff at the same time.

It’s obviously taken a while to catch up on my reading, but with the present MySQL 12 days of Scale-Out I thought I’ll complete my notes for all to see.

If you have never used Wikipedia well, why are you reading this, you should spend an hour there now. Alexa places Wikipedia in one of the top 10 visited sites on the Internet.

Wikipedia runs on the LAMP stack, powered by the MySQL database. Nothing new here, but how Wikipedia scales is. Some of the interesting points involved how a “Content Delivery Network” was build with components including Squid, Lighttpd, Memcached, LVS to improve caching. Appropriate caching is an important component to a successful scale-out infrastructure. An interesting quote however:

The common mistake is to believe that database is too slow and everything in it has to becached somewhere else. In scaled out environments reads are very efficient, and difference of time between efficient MySQL query and memcached request is negligible - both may execute in less than 1ms usually).
Sometimes memcached request may be more expensive than database query simply because it has to establish connection to a server, whereas database connection is usually
open.

Wikipedia has a developed an application Load Balancer. This offers a flexibility in efficient database use and is critical to any scale-out infrastructure. Combined with a good Database API and items such as the Pager class, allows you to write efficient index-based offsets pager (instead of ugly LIMIT 50 OFFSET 10000) syntax for example.

The main ideology in operating database servers is RAIS: - Redundant Array of Inexpensive/Independent/Instable[sic] Servers

  • RAID0. Seems to provide additional performance/space. Having half of that capacity with an idea that a failure will degrade performance even more doesn’t sound like an efficient idea. Also, we do notice disk problems earlier. This disk configuration should be probably called AID, instead of RAID0.
  • innodb_flush_log_at_trx_commit=0, tempted to do innodb_flush_method=nosync. If a server crashes for any reason, there’s not much need to look at its data consistency. Usually that server will need hardware repairs anyway. InnoDB transaction recovery would take half an hour or more. Failing over to another server will usually be much faster. In case of master failure, last properly replicated event is last event for whole environment. No ‘last transaction contains millions’ worries makes the management of such environment much easier - an idea often forgotten by web applications people.

The thing I found interesting in the RAIS mysql-node configuration was slave-skip-errors=0,1213,1158,1053,1007,1062

However, the greatest tip is “All database interaction is optimized around MySQL’s methods of reading the data.” This includes:

  • Every query must have appropriate index for reads…
  • Every query result should be sorted by index, not by filesorts. This means strict and predictable path of data access…
  • Some fat-big-tables have extended covering indexing just on particular nodes…
  • Queries prone to hit multiversioning troubles have to be rewritten accordingly…

Wikipedia is also clever in it’s sharding. A means to implement vertical and horizontal partition of data via the application for optimal scale-out. This comes down to designing your application correct from the start. Wikipedia considers it’s partition via:

  • data segments
  • tasks
  • time

HiveDB (not used by wikipedia) is open source framework for horizontally partitioning MySQL systems. Well worth reviewing.

Wikipedia also makes use of compression. This works when your data can be compressed well like text. This improves performance, however analysis on other projects have shown this does place a CPU impact on the server so it is important to monitor and use appropriately.

Another clever approach is to move searching to tools more appropriate for this task, in this case Lucene. As with any scale-out it is important to leverage the power of appropriate tools for maximum benefit.

I have only summarized Domas’ notes. It’s well worth a detailed read.

Posted under Databases, MySQL, Professional on 15 Jun 2007
Comments (0)
Jun
15

MySQL - Testing failing non-transactional statements

Link to this post

I was asked recently to confirm a consistent state of data in a non-transactional MySQL table after a failing statement updating multiple rows did not complete successfully.

Hmmm, this is what I did.

  • Created a MEMORY table
  • Populated with some data, and a Primary Key
  • Updated the Primary Key so that it failed with a Duplicate Key Error after updating only half the rows
  • Confirmed that the rows that were updated, were, and the rows that were not updated, were not
DROP TABLE IF EXISTS mem1;
CREATE TABLE mem1(
i1  INT UNSIGNED NOT NULL PRIMARY KEY,
c1 CHAR(10) NOT NULL,
dt TIMESTAMP)
ENGINE=MEMORY;

INSERT INTO mem1(i1,c1) VALUES (1,'a'), (2,'b'), (3,'c'), (4,'d'), (5,'e');
SELECT * FROM mem1;
+----+----+---------------------+
| i1 | c1 | dt                  |
+----+----+---------------------+
|  1 | a  | 2007-06-14 17:26:29 |
|  2 | b  | 2007-06-14 17:26:29 |
|  3 | c  | 2007-06-14 17:26:29 |
|  4 | d  | 2007-06-14 17:26:29 |
|  5 | e  | 2007-06-14 17:26:29 |
+----+----+---------------------+
5 rows in set (0.00 sec)

UPDATE mem1 SET i1 = 9 - i1 - SLEEP(1), c1='x' ORDER BY i1;
ERROR 1062 (23000): Duplicate entry '5' for key 1
SELECT * FROM mem1;
+----+----+---------------------+
| i1 | c1 | dt                  |
+----+----+---------------------+
|  8 | x  | 2007-06-14 17:29:05 |
|  7 | x  | 2007-06-14 17:29:05 |
|  6 | x  | 2007-06-14 17:29:05 |
|  4 | d  | 2007-06-14 17:28:36 |
|  5 | e  | 2007-06-14 17:28:36 |
+----+----+---------------------+
5 rows in set (0.00 sec)

While I was also hoping for the TIMESTAMP column to reflect when the row was modified, it was when the statement was executed.

This test did however prove the requirements. Simple when you think about it, but it took a few minutes to think about it the first time.

Posted under Databases, MySQL, Professional on 15 Jun 2007
Comments (0)
Jun
08

Some comments of ‘Five months with MySQL Cluster’

Link to this post

I recently saw the Planet MySQL post Five months with MySQL Cluster by Brian Moon.

Thought I’d add my 5 cents worth (Australian’s don’t have 1 cent coins any more to make 2 cents worth)

Firstly, it’s great you wrote about your experiences in moving to MySQL Cluster. I think more people should.

Joins

“We used a lot of joins. We learned (later) that joins in the cluster are not a good idea.”

MySQL Cluster’s number one strength is Primary Key Lookups, it’s very good you learned that joins (especially say 5-6-7 table joins) are not a good idea, it is not a strength, and certainly not scalable to thousands of queries per second.

Rewrite

“We rewrote our application (basically, our public web site) to use the new cluster and its new table design.”

It’s great you acknowledged that you had to rewrite your application. I’m sure the attitude of people in the industry is: We need more HA, MySQL offers MySQL Cluster as a HA 5×9s solution, let’s just put out existing application on Cluster. This simply does not provide the results people think, and could in theory result in disaster, particularly when choosing H/W, see next point.

I would expand on a few cases of what you re-wrote, this level of education will help educate the masses.

Hardware

“Six new Dell dual-core, dual processor Opterons with a lot of memory”.

First, MySQL Cluster is an in-memory database, so lots of memory is essential. Second, Data Nodes are a single threaded process, so even with 4 cores your H/W will be underutilized as data nodes.

If an organization wants to say get two 4 CPU Dual Core machines (that’s 2×8 cores per machine), it’s impractical to use as Data Nodes. Far greater performance, reliability and scalability is obtained by having 8×2 core machines. The issue then becomes power consumption and rack space, this is what hurts MySQL Cluster. It’s important to remember MySQL Cluster was designed to run on low commodity hardware, and a lot of it.

“So, we configured the cluster with 4 replicas.”

Interesting. You don’t see many references to more then the default, documented and accepted 2 replicas.

Administration

“MySQL Cluster is a whole new animal.”

Yes it is, an ALL DUMP 1000 for example, and then having to parse log files for the “right” strings needs to be improved for example just to determine memory utilization. You may also want to check out ndbtop. I managed to get an earlier version working, but never really had the time to delve more. Monty also may have some admin stuff of interest buried within his NDB Connectors work.

Conclusions

“What a moron!”

Far from it, I hope your article helps in the education of MySQL Cluster more to the community. I’m certainly going to reference my responses to your article as “key considerations” in considering MySQL Cluster for existing applications.

I would add that with MySQL Cluster you require all equipment to be within a LAN, even the same switch. This is important, MySQL Cluster does not work in a WAN situation. I’ve seen an example H/W provided for a trial Cluster with some machines in a West Coast data center, and some in an East Coast data center.

You also can’t have a lag slave for example as in a Master/Slave environment.

You didn’t mention any specific sizes for data, I’d be interested to know, particularly growth and how you will manage that?
You also didn’t mention anything about Disk? MySQL Cluster may be an in-memory database but it does a lot of disk work, and having appropriate disk is important. People overlook that.
You didn’t mention anything about timings? Like how does backups for example compare now to previously.
You didn’t mention version? 5.1 (not GA) is significant improvement in memory utilization due to true varchar support, saving a lot of memory, but as I said not yet production software.

Posted under Databases, MySQL, Professional on 08 Jun 2007
Comments (3)
Newer Posts »
Home
Professional Blog RSS Feed of Professional Blog
Consulting
Presentations
About Ronald
Related Links
Contact Ronald
  • « May spinner iCalendar Jul »
    June 2007
    M T W T F S S
     123
    45678910
    11121314151617
    18192021222324
    252627282930 
  • Categories:
    • Professional
      • 42SQL
      • Apple
        • iPhone
        • MacBook
        • OS/X
      • Clever Design
      • Cloud Computing
        • 10gen
        • AppNexus
        • Kaavo
        • Kloudshare
      • Databases
        • Drizzle
        • Ingres
        • MySQL
          • Compiling
          • GUI Products
          • MySQL Events
            • mysqlcamp01
            • mysqlcamp02
          • MySQL Proxy
          • MySQL User Conferences
            • mysqluc06
            • mysqluc07
            • mysqluc08
          • Storage Engines
            • Non Transactional
              • Infobright
              • KickFire
              • Maria
              • Nitro
            • Transactional
              • Blob Streaming
              • Falcon
              • InnoDB
              • PBXT
              • Solid
        • Oracle
      • Extreme Programming (XP)
      • General
      • Java
        • Tomcat
      • Linux
        • One Liners
      • Microsoft
      • Open Source
        • Buildbot
        • Ubuntu
        • UltimateLAMP
        • Virtual Box
      • OSCON 2008
      • PrimeBase Technologies
      • Solid State Drives
      • Sun
      • The Daily WTF
      • Windoze
      • Yahoo
    • Web
      • Google
        • App Engine
        • Summer of Code
      • Web Development
        • Amazon
          • EC2
          • S3
          • SimpleDB
        • CSS
        • HTML
        • PHP
        • Web 2.0
      • Web Sites
        • Application Software
        • Content
        • Cool Tools
        • Linux Stuff
        • MySQL Related
        • Show Your Stuff
        • Twitter
        • Unype
      • WordPress
  • Pages:
    • Best Of PlanetMySQL Articles
    • Interesting Articles
    • MediaWiki Restyling (1)

  • Archives:
    • August 2008
    • July 2008
    • June 2008
    • May 2008
    • April 2008
    • March 2008
    • February 2008
    • January 2008
    • December 2007
    • November 2007
    • October 2007
    • September 2007
    • August 2007
    • July 2007
    • June 2007
    • May 2007
    • April 2007
    • March 2007
    • February 2007
    • January 2007
    • December 2006
    • November 2006
    • October 2006
    • September 2006
    • August 2006
    • July 2006
    • June 2006
    • May 2006
    • April 2006
    • March 2006
    • February 2006
    • January 2006
    • December 2005
    • November 2005
    • October 2005
    • September 2005
    • July 2005
    • June 2005
    • February 2005
    • October 2004
    • September 2004
    • July 2004
    • June 2004