Query Cache is deterministic

I was reading some points on a Blog post today here. I didn’t see it via Planet MySQL but via a Google Alert.

The post has several good beginner points on MySQL Performance, but included the point “Literal current date is better than using the CURRENT_DATE() function because literals will use the query cache but functions won’t.”

How true, I must admit I’d not considered this in reviewing code for sites that use the Query Cache heavily. Non deterministic functions invalid use of the Query Cache (Except for recent Bug #29053 which is also an interesting read. Note to self, read the new bugs more often when free time permits). Back on point.

So if you have a query like “Select the news for today” and use CURDATE(), it will not be stored in the Query Cache, yet if your application code determines the current date and passes “Select the news for 2007-06-26″ for example it will.

Sometimes the simple things are easy to forget, or not have thought about in this case for me.

My 'Hourly' MySQL Monitor Script Version 0.05

I’ve been able to steal some more time to work on my script following My ‘hourly’ MySQL monitor script Version 0.03 almost a month ago.

The purpose of this script is simple. Create an hourly ‘cron’ job that records and monitors information against the OS and MySQL Server. This is most helpful when environments simply don’t have any monitoring in place. I’ve found it very productive when running a benchmark on site to simply enable for an hour. An amount of analysis is required, but I at least have a baseline of data collection. That is the first goal.

So hourly.0.05.tar.gz is taking some shape, and has the following new features:

  • Provide configurable flag to enable/disable OS and MySQL tests
  • Added MYSQL_SID support, allowing for running MySQL tests against multiple instances on a single server
  • Added configurable mysql.conf support, again supporting multiple server instances
  • Revised file name standards, again to support multiple server instances
  • Added additional error checking for correct MySQL Configuration and operational environment
  • Corrected file pathing of dependent files

My next step now is to provide some simple analysis, I’ll be starting with Statpack Version 2 by colleague Mark Leith.

Watching for Disk Based Tables being created

I didn’t know you could actually do this before addressing this problem in a benchmark using 5.0.36.

MySQL allows you to update two tables with a single UPDATE statement. I knew you could reference two tables in an UPDATE statement but not update both. However when working with a client after benchmarking I observed a large number of Created_tmp_disk_tables via SHOW GLOBAL STATUS and found that this query was the offending query only for certain circumstances. Understanding took a little longer.




INSERT INTO b VALUES(1,1,'a',REPEAT('The quick brown fox jumped over the lazy dog',100));
INSERT INTO b VALUES(2,2,'b',REPEAT('The quick brown fox jumped over the lazy dog',200));
INSERT INTO b VALUES(3,3,'c',REPEAT('The quick brown fox jumped over the lazy dog',300));
INSERT INTO b VALUES(4,4,'d',REPEAT('The quick brown fox jumped over the lazy dog',400));


UPDATE a,b SET a.i = 10, a.c = 'x', b.i = 20 WHERE a.id = b.id and a.id=1;
UPDATE a,b SET a.i = 10, a.c = 'x', b.i = 20,b.t=REPEAT('b',5000) WHERE a.id = b.id and a.id=3;
UPDATE b SET t = REPEAT('a',5000) WHERE b.id=2;


The following output shows the following.

  1. Impact of a SHOW SESSION STATUS increments Created_tmp_tables
  2. A two table update increments Created_tmp_tables
  3. A two table update including a text field increments Created_tmp_tables and Created_tmp_disk_tables
  4. A single table update with the text field increments Created_tmp_tables

So the culprit was the combination of a two table update and a TEXT field.

mysql> source test.sql

| Variable_name           | Value |
| Created_tmp_disk_tables | 0     |
| Created_tmp_tables      | 3     |
2 rows in set (0.00 sec)


| Variable_name           | Value |
| Created_tmp_disk_tables | 0     |
| Created_tmp_tables      | 4     |
2 rows in set (0.00 sec)

UPDATE a,b SET a.i = 10, a.c = 'x', b.i = 20 WHERE a.id = b.id and a.id=1

Query OK, 2 rows affected (0.02 sec)
Rows matched: 2  Changed: 2  Warnings: 0


| Variable_name           | Value |
| Created_tmp_disk_tables | 0     |
| Created_tmp_tables      | 6     |
2 rows in set (0.00 sec)

UPDATE a,b SET a.i = 10, a.c = 'x', b.i = 20,b.t=REPEAT('b',5000) WHERE a.id = b.id and a.id=3

Query OK, 2 rows affected (0.00 sec)
Rows matched: 2  Changed: 2  Warnings: 0


| Variable_name           | Value |
| Created_tmp_disk_tables | 1     |
| Created_tmp_tables      | 8     |
2 rows in set (0.00 sec)

UPDATE b SET t = REPEAT('a',5000) WHERE b.id=2

Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0


| Variable_name           | Value |
| Created_tmp_disk_tables | 1     |
| Created_tmp_tables      | 9     |
2 rows in set (0.00 sec)

The Facebook effect

First it was the SlashDot effect, then it was the Digg effect, now it’s the Facebook effect. I have a friend at Facebook and he was talking about the effect of the Facebook Platform API that was released a few weeks ago. Sites were now struggling to cope with the effect of massive amounts of new traffic, ensuring that experienced MySQL Consultants will have plenty of scale out opportunities.

Here is an abstract from an article I read recently. Analyzing the Facebook Platform, three weeks in

Translation: unless you already have, or are prepared to quickly procure, a 100-500+ server infrastructure and everything associated with it — networking gear, storage gear, ISP interconnetions, monitoring systems, firewalls, load balancers, provisioning systems, etc. — and a killer operations team, launching a successful Facebook application may well be a self-defeating proposition.

This is a “success kills” scenario — the good news is you’re successful, the bad news is you’re flat on your back from what amounts to a self-inflicted denial of service attack, unless you have the money and time and knowledge to tackle the resulting scale challenges.

This comes from the success of iLike. Some more reading references are Crazy love when startup iLike hits pay dirt and Holy cow… 6mm users and growing 300k/day!. Wow!

You can’t buy viral marketing with this type of traffic growth figures. Given the current MySQL 12 Days of Scale-out and recent experiences where clients are seeking HA & scale-out solutions but have not architectured present systems to manage any level of scale-out via the MySQL proven techniques of replication and sharding.

It’s an important lesson that any organization wanting to develop a successful web site needs to ensure the architecture is designed with massive scale-out in mind from the beginning. This means starting with your application supporting partitioning of your data (both vertically and horizontally), and supporting replication, including the possibility of lag with MySQL Replication slaves.

You see on a lot of larger Web 2.0 sites these days after saving data, a message like “Your information will be available momentarily” or a message of this nature and the data saved not automatically displayed. This is a clear means of supporting lag, even if only for a few milliseconds. This is just the first of many steps in application design managing scale-out architectures.

Top 5 wishes for MySQL

Note: My views are just that: mine.

1. Real time Query Monitoring

MySQL 5.0 GA provides only 3 ways to look at queries that are executed on a server in some way or another. Slow Query Log, General Query Log and Binary Log. All require a server reboot to activate and de-activate. In a production system, it’s sometimes critical to be able to know “what is going on”, and you simply can’t reboot the server twice (once to turn on, once to turn off). 5.1 goes some way with Log Tables to being able to turn on General and Slow Logs into tables. Question is, as Kevin Burton listed in his points, when is 5.1 going to be out.

Real time query monitoring also needs to have a granularity of operation better then “server”. There needs to be a capacity to assign this on per connection basis. A server is being hammered, certain status variables are increasing greatly, I need to know now what queries are causing this. MySQL provides no means of doing this. MySQL Proxy is a great new idea/project people will start hearing more about, and it goes a long way to helping, but it’s not dynamic in that I can simply turn on logging on a production system without impact to the MySQL server or connections.

Real time query monitoring granularity of time, also need to be in better units, it’s very difficult to find slow running queries > 100ms when the present granularity is seconds. MySQL Proxy as mentioned, and also Connector/J provide this, (BTW Connector/J has excellent features in it’s many connection options if you develop with Java but it’s yet another output to look at, and when your application server and database servers are on different machine architectures it’s a lot of work to sync).

I am also behind SHOW PROFILE. I’d like to see it being able to be attached to existing connections, and applied to queries, and then output discarded for a time base condition (say < 100ms). Granted the act of observation slows things done, it the ability to be able to observe, see and use the information that’s around in bits that would be a start.

2. Consistent Release Cycles

As Stewart pointed out, it’s ridiculous to have 2 years for release of a product. A consistent cycle is needed. 5.0 Change Notes shows first release to GA 22 months. 5.1 Change Notes still not GA is at 19 months.
We talk now about getting new features, 5.1 is frozen, 5.2 seems lost now in any discussions. 6.0 has a few key features but I’m sure significant new features will be limited to ensure the exposure of Falcon. So, a key new feature maybe in 6.1+

Scope creep, lack of clear planning, test coverage, and user community contribution I feel are all factors. I know that user community testing and contribution is continuing to increase and I applaud the valuable contribution of the community. I wish I could do more myself.

3. Information Schema Extensions

I’ve heard of a Pluggable API for I_S tables in 5.1 Could somebody really confirm? I’ve seen Google doing File System Storage Engines (e.g. for /proc info). I really, really, really wish things like 5.1 processlist and status/variables tables were backported to 5.0 to start with.

In addition, now that I have started, there is need for more detailed information on queries, extensions of status variables that are needed. SHOW PROFILE goes someway internally to indicate what’s going on, but knowing that a certain buffer is being used, and what portion of it per connection will help in sizing. It’s important for example to know the tmp_table_size actually used in a result set, VARCHAR and COLLATION have a huge effect that people simply don’t consider. As the number of Pluggable storage engines increases, the need to know what is really happening is going to be more important. Some of this may be more in relation to Real Time Query Monitoring, but I feel certain additional information is needed to be stored.

4. Online Table Maintenance

It was not until I had to time operations recently for ALTER TABLE ADD COLUMN|INDEX did I realize the extent of the time it takes for InnoDB tables (i.e. is takes your database table offline during this time for any OLTP). My tests were taking over an hour (and I was not in the 3 digit GB range for a table). A real uptime system can’t support downtime like this. Traditionally large scale out MySQL applications have been developed around this limitation, however to compete more with Enterprise experiences, and resources coming from enterprise background this is simply not an option. Add the fact you can’t add a datafile to a Innodb Tablespace online (why not!). While speaking of datafiles, I echo Frank’s comment with the limitation of when using innodb_file_per_table, you can’t copy the file between MySQL Instances (assuming for example all the same version of H/W, O/S, MySQL).

I really hope that Falcon addresses these issues to provide a transactional storage engine offering with these enterprise uptime features.

5. Published Benchmarks

MySQL does not publish any benchmarks, well at least not what I know about. The first problem is: how long is a piece of string. There are millions of variables, but it would be great if even a number of cases of straight forward cases were proven. People may then have a better indication of baseline systems.

Here is my initial wish list.

  • Classification of Server Configuration. Let’s say a comparison between 1 CPU (dual Core), 2 CPU (dual Core) and 4 CPU (dual Core) commodity H/W. With the same memory (4G), disk (local disk), O/S (64 bit to support > 4GB), sample data (20-50GB) and same queries (simple OLTP only) just what are the benefits. Can we get a cost of throughput to $ Cost.
  • Disk Configuration. Just how does local Raid 1/Raid 5/Raid 10, compare with SAN (Raid 5) and SAN (Raid 10). Ok there are many types of disks as well as Raid, but start with commodity SATA 72K, 8MB cache. In addition how does a machine with 6 drives (in Raid 1 with OS,Data,Logs split) compare with Raid 5 or Raid 10.
  • RAM. How do our tests run when we take a 4GB system with 20-50GB of data and give it now 16GB.
  • Backup/Recovery. How long does it take to backup and restore.
  • Admin, how long does it take to ALTER TABLE, add data file, even something simply like how long to load data into a memory table across H/W has been interesting

As you can see it turns into a nightmare quickly, we didn’t talk about storage engines like MyISAM/InnoDB, tuning parameters, different O/S etc but surely something really is better then nothing. If only there was a baseline of data and queries to start with. Surely with the data sources available out there, some enterprising person could create a 20GB, 50GB + realistic production type data source, and 20-50 OLTP queries and we have a baseline.

There is talk of the Build Farm by Jay for compiling, let’s get that baseline so we can run some tests across thousands of configurations.
If sufficient work was done by MySQL to get some standard start, then the community might take up the challenge of taking the data/queries/benchmarking framework and test on all the configurations out there, tune to the wazoo and provide back to the MySQL Forge data for everybody to look at.

One day, knowing that this type of disk with this type of battery backup in this RAID configuration just isn’t anywhere near as good as 3 other types of options at roughly the same price.

The Rest

There are more, but in keeping with the spirit of 5 knowing that at least 2 people have shown scope creep already I’ll stop. I really want to mention more.

More About Top 5

Jay Pipes started the Top 5 wishes for MySQL recently. Here are the Planet MySQL contributors to date.

Jay Pipes
Marten Mickos – MySQL CEO
Stewart Smith
Kevin Burton
Farhan Mashaqi
Jeremy Cole

It’s almost like a chain letter, so I’ll start it by passing it onto 3 more, my challenge is to: — Mike Kruckenberg, my evil(he isn’t really evil) twin – Roland Bouman & Paul McCullagh.

27 June 2007 Update
Since my posting we have also had:
Antony Curtis
Alan Kasindorf
Jim Winstead
Jonathon Coombes

MySQL – Wikipedia

I was reading only last week the notes from Wikipedia: Site Internals, Configuration and Code Examples, and Management Issues Tutorial by Domas Mituzas at the recent 2007 MySQL Conference. I didn’t attend this session, like a lot of sessions too much good stuff at the same time.

It’s obviously taken a while to catch up on my reading, but with the present MySQL 12 days of Scale-Out I thought I’ll complete my notes for all to see.

If you have never used Wikipedia well, why are you reading this, you should spend an hour there now. Alexa places Wikipedia in one of the top 10 visited sites on the Internet.

Wikipedia runs on the LAMP stack, powered by the MySQL database. Nothing new here, but how Wikipedia scales is. Some of the interesting points involved how a “Content Delivery Network” was build with components including Squid, Lighttpd, Memcached, LVS to improve caching. Appropriate caching is an important component to a successful scale-out infrastructure. An interesting quote however:

The common mistake is to believe that database is too slow and everything in it has to becached somewhere else. In scaled out environments reads are very efficient, and difference of time between efficient MySQL query and memcached request is negligible – both may execute in less than 1ms usually).
Sometimes memcached request may be more expensive than database query simply because it has to establish connection to a server, whereas database connection is usually

Wikipedia has a developed an application Load Balancer. This offers a flexibility in efficient database use and is critical to any scale-out infrastructure. Combined with a good Database API and items such as the Pager class, allows you to write efficient index-based offsets pager (instead of ugly LIMIT 50 OFFSET 10000) syntax for example.

The main ideology in operating database servers is RAIS: – Redundant Array of Inexpensive/Independent/Instable[sic] Servers

  • RAID0. Seems to provide additional performance/space. Having half of that capacity with an idea that a failure will degrade performance even more doesn’t sound like an efficient idea. Also, we do notice disk problems earlier. This disk configuration should be probably called AID, instead of RAID0.
  • innodb_flush_log_at_trx_commit=0, tempted to do innodb_flush_method=nosync. If a server crashes for any reason, there’s not much need to look at its data consistency. Usually that server will need hardware repairs anyway. InnoDB transaction recovery would take half an hour or more. Failing over to another server will usually be much faster. In case of master failure, last properly replicated event is last event for whole environment. No ‘last transaction contains millions’ worries makes the management of such environment much easier – an idea often forgotten by web applications people.

The thing I found interesting in the RAIS mysql-node configuration was slave-skip-errors=0,1213,1158,1053,1007,1062

However, the greatest tip is “All database interaction is optimized around MySQL’s methods of reading the data.” This includes:

  • Every query must have appropriate index for reads…
  • Every query result should be sorted by index, not by filesorts. This means strict and predictable path of data access…
  • Some fat-big-tables have extended covering indexing just on particular nodes…
  • Queries prone to hit multiversioning troubles have to be rewritten accordingly…

Wikipedia is also clever in it’s sharding. A means to implement vertical and horizontal partition of data via the application for optimal scale-out. This comes down to designing your application correct from the start. Wikipedia considers it’s partition via:

  • data segments
  • tasks
  • time

HiveDB (not used by wikipedia) is open source framework for horizontally partitioning MySQL systems. Well worth reviewing.

Wikipedia also makes use of compression. This works when your data can be compressed well like text. This improves performance, however analysis on other projects have shown this does place a CPU impact on the server so it is important to monitor and use appropriately.

Another clever approach is to move searching to tools more appropriate for this task, in this case Lucene. As with any scale-out it is important to leverage the power of appropriate tools for maximum benefit.

I have only summarized Domas’ notes. It’s well worth a detailed read.

MySQL – Testing failing non-transactional statements

I was asked recently to confirm a consistent state of data in a non-transactional MySQL table after a failing statement updating multiple rows did not complete successfully.

Hmmm, this is what I did.

  • Created a MEMORY table
  • Populated with some data, and a Primary Key
  • Updated the Primary Key so that it failed with a Duplicate Key Error after updating only half the rows
  • Confirmed that the rows that were updated, were, and the rows that were not updated, were not

INSERT INTO mem1(i1,c1) VALUES (1,'a'), (2,'b'), (3,'c'), (4,'d'), (5,'e');
| i1 | c1 | dt                  |
|  1 | a  | 2007-06-14 17:26:29 |
|  2 | b  | 2007-06-14 17:26:29 |
|  3 | c  | 2007-06-14 17:26:29 |
|  4 | d  | 2007-06-14 17:26:29 |
|  5 | e  | 2007-06-14 17:26:29 |
5 rows in set (0.00 sec)

UPDATE mem1 SET i1 = 9 - i1 - SLEEP(1), c1='x' ORDER BY i1;
ERROR 1062 (23000): Duplicate entry '5' for key 1
| i1 | c1 | dt                  |
|  8 | x  | 2007-06-14 17:29:05 |
|  7 | x  | 2007-06-14 17:29:05 |
|  6 | x  | 2007-06-14 17:29:05 |
|  4 | d  | 2007-06-14 17:28:36 |
|  5 | e  | 2007-06-14 17:28:36 |
5 rows in set (0.00 sec)

While I was also hoping for the TIMESTAMP column to reflect when the row was modified, it was when the statement was executed.

This test did however prove the requirements. Simple when you think about it, but it took a few minutes to think about it the first time.

Some comments of 'Five months with MySQL Cluster'

I recently saw the Planet MySQL post Five months with MySQL Cluster by Brian Moon.

Thought I’d add my 5 cents worth (Australian’s don’t have 1 cent coins any more to make 2 cents worth)

Firstly, it’s great you wrote about your experiences in moving to MySQL Cluster. I think more people should.


“We used a lot of joins. We learned (later) that joins in the cluster are not a good idea.”

MySQL Cluster’s number one strength is Primary Key Lookups, it’s very good you learned that joins (especially say 5-6-7 table joins) are not a good idea, it is not a strength, and certainly not scalable to thousands of queries per second.


“We rewrote our application (basically, our public web site) to use the new cluster and its new table design.”

It’s great you acknowledged that you had to rewrite your application. I’m sure the attitude of people in the industry is: We need more HA, MySQL offers MySQL Cluster as a HA 5x9s solution, let’s just put out existing application on Cluster. This simply does not provide the results people think, and could in theory result in disaster, particularly when choosing H/W, see next point.

I would expand on a few cases of what you re-wrote, this level of education will help educate the masses.


“Six new Dell dual-core, dual processor Opterons with a lot of memory”.

First, MySQL Cluster is an in-memory database, so lots of memory is essential. Second, Data Nodes are a single threaded process, so even with 4 cores your H/W will be underutilized as data nodes.

If an organization wants to say get two 4 CPU Dual Core machines (that’s 2×8 cores per machine), it’s impractical to use as Data Nodes. Far greater performance, reliability and scalability is obtained by having 8×2 core machines. The issue then becomes power consumption and rack space, this is what hurts MySQL Cluster. It’s important to remember MySQL Cluster was designed to run on low commodity hardware, and a lot of it.

“So, we configured the cluster with 4 replicas.”

Interesting. You don’t see many references to more then the default, documented and accepted 2 replicas.


“MySQL Cluster is a whole new animal.”

Yes it is, an ALL DUMP 1000 for example, and then having to parse log files for the “right” strings needs to be improved for example just to determine memory utilization. You may also want to check out ndbtop. I managed to get an earlier version working, but never really had the time to delve more. Monty also may have some admin stuff of interest buried within his NDB Connectors work.


“What a moron!”

Far from it, I hope your article helps in the education of MySQL Cluster more to the community. I’m certainly going to reference my responses to your article as “key considerations” in considering MySQL Cluster for existing applications.

I would add that with MySQL Cluster you require all equipment to be within a LAN, even the same switch. This is important, MySQL Cluster does not work in a WAN situation. I’ve seen an example H/W provided for a trial Cluster with some machines in a West Coast data center, and some in an East Coast data center.

You also can’t have a lag slave for example as in a Master/Slave environment.

You didn’t mention any specific sizes for data, I’d be interested to know, particularly growth and how you will manage that?
You also didn’t mention anything about Disk? MySQL Cluster may be an in-memory database but it does a lot of disk work, and having appropriate disk is important. People overlook that.
You didn’t mention anything about timings? Like how does backups for example compare now to previously.
You didn’t mention version? 5.1 (not GA) is significant improvement in memory utilization due to true varchar support, saving a lot of memory, but as I said not yet production software.

Truemors, Tumors, Dribble

I was sent this email.


By the Numbers: How I built a Web 2.0, User-Generated Content, Citizen – Journalism, Long-Tail, Social Media Site for $12,107.09

I have had my flatmate talk about it a lot in the past day. So I checked it out. Here is my view.

My Review

I’ve heard the name “Guy Kawasaki” but never read his stuff, or followed him. My first impression was the 2007 MySQL Conference and I was suitably impressed. See MySQL Conference – The next keynote with Guy Kawasaki

So some of my comments following this site — Truemors.

$1,115.05. I spent $1,115.05 registering domains. I could have used GoDaddy and done it a lot cheaper, but I was too stupid and lazy.

55. I registered 55 domains (for example, truemors.net, .de, .biz, truemours, etc, etc). I had no idea that one had to buy so many domains to truly “surround” the one you use. Yes, I could have registered fewer and spent less, but who cares about saving a few hundred bucks compared to the cost of legal action to get a domain away from a squatter if Truemors is successful?

Me: Your an idiot, and I don’t use the term lightly. I had great respect after one presentation, I even referred to his talk last night at a meeting for people to improve their presentations, but you paid 3 times the price for one domain, ok, but for 55, and then he hasn’t covered the name to well. A few quick typos and people have already registered names and stuck up google ads.

4. I learned four lessons launching Truemors:
There’s really no such thing as bad PR.
$12,000 goes a very long way these days.
You can work with a team that is thousands of miles away.
Life is good for entrepreneurs these days.

Me: There is such a thing as bad PR if your not “Guy Kawasaki” a name that draws a crowd.
$12,000 does go a long way if you spend it wisely. $399 for a logo, I could have done in 10 mins in Photoshop, and I’m an amateur. I’ve got better free logos on my sites. You have to get to being a successful entrepreneur before you can say “life is good for entrepreneurs these days”


I could make a comment on a number of his points.
My conclusion is he certainly polarized people. The responses are either completely for, or completely against.

If you want to read on, here are a few comments I’ve taken from his own blog comments which I found interesting. For those lazy and just like scanning.
Oh, and if you go to his site, the top reference on his own site is people bagging his site. Yeah that’s really good PR – Screen print at the end.

Poor Guy! I have a number for you: number of websites needed to destroy your reputation: 1.

Wait, a minute you’re an “expert” on startups, and you spent more than a $1000 on domain name registration? Hahaha! LMAO

To me, Truemors would have been better if you:

1) Spent some money to have some juicy content prepared for launch to be submitted by random people.
2) Focused the product to a specific topic. Digg started out with technology only. Truemors could do celebrity gossip?

OK, here’s a “truemor”: I just scanned the comments below and Ray’s Jun 4, 2007 12:33:34 AM posting was so LOL funny *I* damn near messed myself. :-)

$4,824.14 for legal fees… What did the lawyers do for you? Did they come up with your terms of use? Do you have a privacy statement? Do tell!

Interesting numbers – but totally irrelevant! The whole point of business is to make money, not how carefully you spend what money you have! It would seem to me that Web 2.0 is all about hype – getting cheap eyeballs and selling them! Could you repost in a month or so and let us know when you’ve started making money?

You’ve proven you can start a web2.0 site for under 15k? WTF. Are you serious or are you just in your 30’s and senile? Most “successful” web2.0 sites get off the ground with less than $1,000.00. Truemors will be a graveyard in a month… might as well throw in the towel now.

You paid $399 for that logo!?? You could have told me…I would have created something better than that for free… Honestly, most of the things look a huge waste of money..
Guy, I guess this might offend you, but lately it seems you’ve been investing in some bad ideas. The idea for your new site is “okay”, however the layout is so terrible that the first time I saw the site, it was so annoying I skipped the article that brought me there and closed my browser tab.

Want to hatch a great idea?

I attended The Hatchery tonight. A rather brilliant opportunity for you to pitch your present idea “The Gauntlet”, and get “The Panel” to provide expert advice your friends are not going to tell you. Ultimately you are also seeking money as well as expertise.

The panel tonight for June 6, 2007 was excellent. A great mix of experience and expertise. I was impressed with the level of knowledge and diversity of questions asked. They were all smart people!

  • Emcee – Yao-Hui Huang
  • Natalia Allen – Design Futurist, Schlossberg Flynn
  • Anne Andiorio – Digital media, services, technology guru
  • Stacy Robin – Managing Partner, The Degania Group
  • Pamela Robertson – Partner, Edwards, Angell Palmer & Dodge
  • Sarah Tavel – Analyst, Bessemer Ventures
  • Peggy Wallace – Co-Leader of NY Forum, Golden Seeds

So, “The Eggs” for the evening were four.

Smart Medical Consumer

We started with Dr. Banu Ozden and Smart Medical Consumer. The concept was “Web Services for consumers for their medical finances”. The mission is to help people besides being a profitable company. The target is the Consumer Health Care IT which is presently a hot topic, and the target audience was subscribers – people that use this site to hold medical expense data, and visitors for providing information.

There were a number of things that stood out for me. First, when presenting slides, use a larger font. A slide is a summary, not a resume on a page for example. Second, they are trying to base a revenue model on Internet advertising rather then a subscription model. This may be ok, when the advertising can be very selective, but as it was indicated they would be seeking their own advertising, a very time consuming task. For anybody that follows the Internet closely, you can’t solely based your revenue model on this and expect to predict accurately the results. While it appeared there were strong skills and experience in IT, there was a clear lack of Internet knowledge. The web site is very drab. While it’s serving a purpose and attracting generally a lower level of attendee, web sites get 3-5 seconds viewing time unless you have gone to the web site specifically. For this project the bulk of visitors are going to be people who stumble across the site.
A question raised by the panel, and discussed more with the people around me was security. The question was not answered satisfactory, infact it was not part of the presentation, it should have been. Security is critical, and the web is inherently insecure. This will be a hurdle I believe this site will not overcome.


FlyUpload – share your files, is a file sharing site. They offer nothing special, nothing new, nothing different. There was plenty of talk about being better, but in reality it was unclear. Combined with the founders being former founders of a previous startup, from which they purchased the IP, and is still in operation, an unclear advertising model, and unclear answers towards both the concerns of bandwidth and ultimate purpose. It appeared from the presentation, they were going to share your information as an advertising means, and also analyze your data for advertising opportunities. The presenter was unclear in a number of topics, and simply repeatably didn’t answer the questions posed by the panel.

Match My Pet

Prior to this presentation, there was a short break, however the slide for this site Match My Pet was put up. So, discussion was definitely more on this topic then the last. It lead to a number of jokes, like is this a dating site for pets etc.

The presentation of the idea was actually the best so far. The presenter gained the interest of the audience via some clever input, such as “P2P” (Pet-to-Pet), “B2B” (Breeder-to-Breeder) and P2B. They provided a one stop shop including GeoPet, PetStore, Recipet, Pricipet, Wikipet, and Vet911. It was clear from words of the presenter he was targeting “Breeding Services/Matching Services”, however a question from the panel was “Chapter 1 of the VC book – Focus” and it was clear there was not a clear focus. I also feel the presenter didn’t quite get the question in all dimensions. It also related to vertical. There was some talk about the horse breeding market, however it was also clear this was more as a by product rather then selectively researching and targeting this audience.

This project obviously started from having fun, and moving from there into a business. To be successful however, you need a plan, and to speak in front of potential VC’s you need a rock solid plan. This wasn’t that.

I think there is opportunity for the site “Pimp your Pet”. I also think, there is an opportunity for a “Hot on Not” for pets, obviously people love their pets.

License Sandbox

Augustine Fou of LicenseSandbox got it. I don’t know if HatchedBy seeded the presenters to a crescendo but they ended clearly on the best presentation.

His point was clear, his slides were minimalistic, yet portrayed his point clearly, also showing the fruit of this venture (i.e. photos), and he knew the VC talk. He used the right buzz words, and he knew his figures when asked by the panel. So what is.

“Find+License+Pay” An opportunity to target the non-professional content providers, what he calls “The Meaty Middle Opportunity” between the dollar bin and the expensive “getty type” service. While he quickly ran out of time (speakers get a 5 minute egg timer) there is clear potential, and they are targeting something unique, but clearly value adding. First by negotiating with the larger ad agencies, facilitating buyers and sellers and adding a layer of intelligence into the clear minefield of finding valuable content. In addition, for me he hit the spot. I am a serious photographer, everybody tells me I should sell my stuff, however I don’t want to have to be drawn away from what I enjoy, the taking photographs in order to market and sell my stuff. This could be a happy medium and I’m definitely going to try it out.

Augustine was also up on what’s hot, with his Moo Cards. I wonder however, how many people like me were able to know that immediately!

Closing Comments

I remember a TV show in Australia called The Dragons Den. A similar upmarket pitch your presentation for money, and the panel had money to burn.

The forum was excellent, but for me I’ve not got one idea at a stage. I actually need the help getting to the stage to being able to present to VC’s.

In addition, all presenters need to look closely at the presentations to learn how to present better, be better prepared, relate better, and also know how to answer questions. The personalities of the panel were different, you could interprete from the types of questions how the answers needed to be structured.

Top marks Yao, I think you are at the leading edge of a great concept here in NY.

Some ideas

Almost forgot to mention, here are some ideas.

  1. There was no opportunity for the audience to say anything, with the limited format this could be a disaster, however I propose two options.
    • There is a 5 minute breakout between each talk, allowing the audience to at least approach the speaker, give 15 seconds what they would like to contribute to, let the speaker hear them all, then decide to schedule more dialog. Interested parties get to hear all the 15 second inputs from the crowd.
    • More and more people will bring laptops (you will need more chair space), but savvy people can post comments very quickly and a moderator could collate, and say there is 5 minutes at the end of the moderator asking questions from the audience. This way it can be controlled via time. Very easy to leverage technology to do this. At a recent conference of 1500-1600 people, up to half the audience had laptops, in the smaller breakout rooms it would have been more then 50%
  2. I’d like to see more an ideas incubator, maybe more a town meeting type style. The problem is structure. It would need some consideration, the upcoming Hatch Match in August may be the place, will need to wait and see

Things that irk me!

As part of my job, I spend a lot of time assisting people when they are driving. But sometimes is can be trying.

  • People that type commands, make a mistake, then backspace over typed text (like 10-20 characters), only to have to retype the text again. Using bash for example, you can just arrow back, change then goto end of line, saving all that re-typing. And it’s so painful when they are slow typer’s.
  • People that have to use copy/paste (ie, mouse). But when they scroll up several pages to find a command, copy, then scroll down to paste when up arrow a few times in command history gives you the command you want. Even, simply removing the scrolling down (it’s not needed to paste) would save 1 of 4 operations.
  • Using copy/paste in vi, where your not in insert mode(so it doesn’t work properly, so they have to exit insert mode, copy, enter insert mode, paste, just to do it again)! People need to learn the power of Yank (Y or yy)

Google Street View Camera

Google Maps Street View Camera
There has been plenty of news about Google Maps Street View that I blogged about recently.

I wanted to know how they do it, well here is how they do it.
Google Maps zoom: here’s the device and vehicle behind it.

There has been plenty of news this week, images of tunnels in New York, which is apparently a security risk, a guy attempting to climb over a fence in front of an apartment block, a guy leaving a known sex shop.

Some articles I found interesting were Google Zooms In Too Close for Some and All-seeing Google Street View prompts privacy fears.

Innodb Monitoring I didn't know

Ok, so I knew about innodb_table_monitor and innodb_tablespace_monitor. I’ve tried them before, looked at the output and given up, partly because it didn’t serve the purpose I wanted it to at the time, and also because it’s format was a little cryptic.

What I didn’t know was there are actually 4 monitors via this “create table functionality”. You can also do innodb_monitor which is the same as SHOW INNODB STATUS, and you can also do innodb_lock_monitor .

Another thing I didn’t know is that these commands don’t send the output just once, it’s on a timer. I’ve found the timers to be different. For innodb_monitor you get every 15 sec, as well as a nice line given time of averages which seems to always say 16 seconds.

Per second averages calculated from the last 16 seconds

innodb_table_monitor was in this case 65 seconds??



The fact it goes to the MySQL error log is annoying, rather then a log for Innodb Monitoring. It’s much easier to script when you know the content coming into a file and can control it, with the MySQL error log you can’t.

More info at SHOW ENGINE INNODB STATUS and the InnoDB Monitors.

So you have to read the fine print in the MySQL Manual to get these things. If you also read the manual page to completion, which I did after I wrote this initial post you find yet another option innodb_status_file. If this is set for the server, it writes the output of SHOW ENGINE INNODB STATUS to a file every 15 seconds. What about the other outputs to file? Seems a need to be consistent here.

Things to ponder more.

Now were are all those useful tools that parse this output?
Is this output worth parsing and monitoring at any regularity?

Updated: 06/09

Two additional references that help in the understanding of these monitors are:

Log Buffer #47: a Carnival of the Vanities for DBAs

Log Buffer #47: a Carnival of the Vanities for DBAs
June 1st, 2007 – by Ronald Bradford

Welcome to the 47th edition of Log Buffer, the weekly review of database blogs. No time to wait, lets read more about this week’s database blogging activities.

The PostgreSQL Conference for Users and Developers wrapped up this week and Peter Eisentraut gives us a review including the lightning talks and wrap-up session with a charity auction in PGCon Day 4. Meanwhile Alex Gorbachev is at Miracle Scotland Database Forum – Day One, sounds like from his post there is a lot of drinking and tasting going on.

Tim Hall in Schema Owners and Application Users… starts with “I was trying to explain to a colleague the concept of using application users, rather than logging directly into the schema owner.” As he mentions, it’s an introductory topic however his article gives us a detailed discussion of implementation.

An OTN blog consolidates a number of announcements for Oracle, PHP and OPAL in it’s concise publication New PHP Doc and Software. Plenty of reading links here.

Paul Moen of Pythian covers an important step in Oracle: Standby Automatic File Management follows on from his earlier posts in creating a standby database.

Roland Bouman in MySQL User Defined Function Repository has made an effort to build a central repository of the varied UDF’s that exist to extend MySQL SQL Function syntax. His follow up article The MySQL UDF Repository: lib_mysqludf_sys shows just how dangerous these can really be.

Partitioning, a key new feature in MySQL 5.1 can be incorrectly used as Sergey Petrunia has described in Partition pruning tip: Use comparisons of column, not of partitioning function value. While 5.1 is not GA yet, and maybe not for some time Kevin Burton was quick to comment about software functionality about to be deployed with this new functionality. Of course partitioning in MySQL is free, in Oracle it is not as Mathias describes in Rant about partitioning licensing. Not only do you have to purchase the top of the line Enterprise Edition at 40K, you have to purchase an additional option on top of that. I didn’t know that.

Peter Zaitsev of the MySQL Performance Blog discusses Predicting how long data load would take. This is a very common problem particularly in a direct changeover of a production system to MySQL, and as with Peters example of loading 1TB, this is no longer an uncommon problem with MySQL.

Jag Singh from Optimize Data Warehouse uses a practical approach in Data type validation using regular expressions with a procedural language before data loading into a database, or performed within a database. It’s important we don’t lose site that some things are best done not in the database.

Slashdot starts a flame war with aptly titled 8 Reasons Not To Use MySQL (And 5 To Adopt It) with a reference between Five Compelling Reasons to Use MySQL and Eight Sound Reasons Not to Use MySQL, both published by CIO magazine on the same day. MySQL Performance Blog, and Curt Monash were quick to respond with MySQL – to use or not to use and Whether or not to use MySQL respectively.

Q&A Webinar Part 4 – MySQL Cluster by Ivan Zoratti gives us 33 points of reference with MySQL Cluster. Being involved with MySQL Cluster, it’s interesting to read the types of questions people are asking. Following up, Jonathan Miller a seasoned veteran with MySQL Cluster shows in Just when you think you know something that even the experts can be stretched with the new MySQL Cluster Certification Exam. Likewise I was surprised when I passed the exam myself recently of it’s complexity.

Andy Campbell in his blog Oracle Stuff I Should Have Known demonstrates he has too much spare time on his hand in understanding terminal colors in a novel but interesting Adding some colour to SQL*Plus. Would have looked better with some blue!

One of the features I promote in MySQL, and that exists in other RDBMS’, but Oracle does not have is native Multirow Inserts. Robert Vollman however provides Oracle’s two verbose alternatives. Only knew about one of them myself. Still they are far from the simplicity other database products have.

Brian Duff of DuffBuff writes If I had five Oracle wishes, they would be…. We all have wish lists and Brian writes, “I was thinking about things in the Oracle-sphere that I’d love to see happen over the next few years”. I won’t spoil his wish list, I think point 4 is a good one.

Using the OUTPUT and OUTPUT INTO clauses in SQL Server 2005 describes the new features for retrieving values that were just inserted/updated/deleted by a DML statement. Vardecimal Storage Format in SP2 discusses another new feature in SQL Server 2005, the new Vardecimal format. The caveat being that this functionality is restricted to the Enterprise and Developer Editions. One using SQL Server should also know about the Swiss-knife features in SQL Server to help you.

Things to take care before installing SQL Server 2005 on Vista operating system or Windows Server 2008?. Nothing more to say here, perhaps you need to read if your a SQL Server user.

Data Geek Gal Beth touches on one of my pet peeves in Data Quality: As Meets the Needs of an Organization. If data has a structure, appropriate rules should be put in place to ensure this is maintained in the database. You will forever be doing data cleansing if you simply bother to consider this basic step later. Developers should really learn to be smarter in this area. One point not discussed is, “Where are the validation rules applied, the application or the database?”

I’m not sure if a Log Buffer has had any images before, but I recently stopped at the Oracle HQ in Redwood city to snap this photograph. The irony of the experience not portrayed within the photograph was I worked for Oracle Corporation in the 1990’s but never visited the HQ, and while I was out taking this photo I was wearing a MySQL shirt. (No honking by any peak hour Oracle workers).

That’s it for this week, until next week. Clocks ticking Frank!

Google Maps adds real-time images

Yesterday I was amazed to see yet another new feature in Google Maps.

A new button [Street View], gives you are real view of the street in question, you can rotate the image around 360 degrees, and move up and down the street. Now not all areas have been mapped yet, but you will see highlighted in blue what is.

It’s rather amazing what they can do. Presently they have images of San Francisco, Las Vegas, Denver, Miami and New York.

BTW: On the news tonight (Wed May 30) there is already controversy over the feature. People have been able to zoom in and see great detail of people, sometimes in places were it may be inappropriate.

My ‘hourly’ MySQL monitor script Version 0.03

I realized when I released my very crappy version of My ‘hourly’ MySQL monitor script I really should have included my standard logging.

So I did that the night I wrote my original blog, but never published it. I’ve had need to use it again today, so a few more usability tweaks for parameterization and we are good to go.

Now Version 0.03 includes three files:

  • hourly.sh
  • common.sh
  • mysql.conf

Simple use is:

$ cd /directory
$ vi mysql.conf
# correctly specify MYSQL_AUTHENTICATION
$ chmod +x ./hourly.sh
$ nohup hourly.sh &

This gives you the following files

-rw-r--r-- 1 rbradford rbradford  2643 2007-05-29 15:47 mysql.innodbstatus.070529.154757.log
-rw-r--r-- 1 rbradford rbradford   414 2007-05-29 15:47 mysql.processlist.070529.154757.log
-rw-r--r-- 1 rbradford rbradford 12597 2007-05-29 15:47 mysql.status.070529.154757.log
-rw-r--r-- 1 rbradford rbradford 22229 2007-05-29 15:47 mysql.variables.070529.154757.log
-rw-r--r-- 1 rbradford rbradford 13146 2007-05-29 15:47 os.ps.070529.154757.log
-rw-r--r-- 1 rbradford rbradford   390 2007-05-29 15:48 os.vmstat.5.070529.154757.log

By default, written in /tmp, you can override by setting LOG_DIR.

It gives you a pile of output you can easily grep, I’m working on some very simple graphing. One thing I have done is pass the status into Mark Leith’s Aggregating SHOW STATUS Output as well as passed on some feedback that I hope will get integrated into later solutions.

For now, it’s a tool I can implement in a few seconds, run while somebody is showing or demonstrating a system, and I’ve got some meaningful information to look at. Combined with my more in-depth ‘minute’ script, a general-log and taking notes of individual steps in a system walk though, I have all the information I need to analyze a working system very quickly from a purely database level. Still there is lots to do manually, but I’ve got a consistent view of information to review.

My 'hourly' MySQL monitor script

Following my article Everything fails, Monitor Everything, and some inquiries, I’ve made some small modifications to my initially hourly script. This script is still a quick and dirty trial of what I’m wanting to develop, but in true Guy Kawasaki terms “5. Don’t worry, be crappy”. It works for now, and enables me to determine what works and what doesn’t.

My goals are Data Collection, Data Analysis and Data Presentation. This is the start of Data Collection. So now I get the following files:

  • os.vmstat.070524.122054.log
  • os.ps.070524.122054.log
  • mysql.innodbstatus.070524.122054.log
  • mysql.processlist.070524.122054.log
  • mysql.status.070524.122054.log
  • mysql.tablestatus.070524.122054.log
  • mysql.tablestatus.vertical.070524.122054.log
  • mysql.variables.070524.122054.log

#  Name:    hourly
#  Purpose: Script to 'cron' hourly to run for monitoring
#  Author:  Ronald Bradford

error() {
  echo "ERROR: $1"
  exit 1

[ ! -f $MYSQL_AUTHENTICATION ] && error "You must specific MySQL Authentication in $MYSQL_AUTHENTICATION"
[ -z `which mysqladmin` ] && error "mysqladmin must be in the PATH"


# run vmstat every second for 1 hour
# normally this is an overkill 5 seconds is acceptable
# but need to monitoring any spike

echo "INFO:  Logging vmstat $VMSTAT_OPTIONS to $LOG_FILE"
vmstat $VMSTAT_OPTIONS > $LOG_FILE 2>&1 &

echo "INFO:  Logging ps to $LOG_FILE"
ps -ef > $LOG_FILE 2>&1 &

echo "INFO:  Logging mysqladmin variables to $LOG_FILE"
echo "| date_time                        | $DATETIME |" > $LOG_FILE
mysqladmin $AUTHENTICATION variables >> $LOG_FILE 2>&1 &


echo "INFO:  Logging mysqladmin extended-status per $SLEEP_TIME secs for $MAX_COUNT times to $LOG_FILE1"
echo "INFO:  Logging mysqladmin processlist per $SLEEP_TIME secs for $MAX_COUNT times to $LOG_FILE2"
echo "INFO:  Logging mysql show innodb status per $SLEEP_TIME secs for $MAX_COUNT times to $LOG_FILE3"
while [ $COUNT -lt $MAX_COUNT ]
  echo "| date_time                        | $NOW |" >> $LOG_FILE1
  echo $NOW >> $LOG_FILE2
  echo $NOW >> $LOG_FILE3
  mysqladmin $AUTHENTICATION extended-status >> $LOG_FILE1
  mysqladmin --verbose $AUTHENTICATION processlist >> $LOG_FILE2
  COUNT=`expr $COUNT + 1`
  sleep $SLEEP_TIME
exit 0

So from here, I need to:

  • Put into my standard sh script framework which provides correct logging, message management and true parameterization.
  • Additional pre-checks for the correct security requirements
  • Revised Parameterised settings including database
  • Host and Instance logging
  • Additional file parsing for later Data Analysis and Data Presentation.

Got the next great web thing!

I joined a new meetup group New York Dot Com Hatchery on advice from my friend Marc. Now only if my schedule keeps in NY for the event I’ll be very much looking forward to going.

I think this statement on the www.hatchedby.us website sums up the opportunity for expert advice.

The payoff – Your friends won’t tell you the truth, we will. And we can make your billion-dollar dreams come true. Your odds for success? Better than buying a lottery ticket.

Cool Photo Printing Site

I just came across Moo as a link from Fotolog.

This site gives you the option to get a photo like card on the front, and text on the back. The interesting part is that you can do 100 different photos in the batch of 100, and it’s the size that grabs me. I’ve been looking for something different with my various sites including Admiring Creation and Heavy Horse Day and GeekErr. Just now trying to get all my photos ready for printing!

Using Perl with MySQL

NOTE: Problems presently exist, I’m seeking the expert help of the community and Perl Gurus

I have the need to do some quick benchmarking, I use MyBench as it’s effective in being able to plug in a query, some randomness and 2 minutes later (with a correctly configured Perl/MySQL environment) you have multi-threaded load testing.

However, when the environment you are on is not configured, and you only know the basics for Perl Operation and Installation, (code is just code, that’s the easy part) and the box is not accessible to the outside world say for cpan, it gets more complicated. I’ve attempted to install and configure DBI, DBD::mysql and Time::HiRes but without success.

DBI was straightforward, a download, make, install worked without issue. a make install was performed.

DBD::mysql didn’t need an compile, mysql.pm already existed and make said it was all up to date. However then running mybench gave the first error.

failed: Can't locate DBD/mysql.pm in @INC

Ok, so it wasn’t installed as ‘root’. Some small Perl Pathing.

PERL5LIB=~/DBD-mysql-4.004/lib;export PERL5LIB

Let to the next error:

failed: Can't locate loadable object for module DBD::mysql in @INC (@INC

Hmmm, a little more complicated. So going back to the compiling part, I realized I could force compile it, and this also confirmed one possible issue, the libmysqlclient library.

perl Makefile.PL
I will use the following settings for compiling and testing:
cflags (mysql_config) = -I/path/to/mysql/include
embedded (mysql_config) =
libs (mysql_config) = -L/path/to/mysql/lib -lmysqlclient -lz -lcrypt -lnsl -lm

Both the mysql.h and libmysqlclient software correctly located and valid, but still no luck.

Moving in parallel I then managed a SA that could install the rpm’s (being RHEL). Problem is, MySQL is not installed via RPM, so the only possible means of installing DBD::mysql is to force no dependencies. This did not prove successful be added clues to the problem.

failed: Can't load '/usr/lib64/perl5/vendor_perl/5.8.5/x86_64-linux-thread-multi/auto/DBD/mysql/mysql.so' for module DBD::mysql: libmysqlclient.so.14: cannot open shared object file: No such file or directory at /usr/lib64/perl5/5.8.5/x86_64-linux-thread-multi/DynaLoader.pm line 230.

So it used the installed DBD::mysql, and it couldn’t find the expected library path. This gives some confident that perhaps the earlier version is right, and that some other dependancy is missing.

I’ve not found any good resource to work though this online in my environmental situation, but surely this is pain that somebody else has experienced.

Thanks for those friends that have already contributed small parts to getting someway here, however it just ain’t working, and I need it to just work.

Any input appreciated.

New Category – Clever Design

I am sometimes really impressed by Clever Design in print media or TV media. Over the years I’ve even collected piles of ideas, who knows where they all are now. Time to consolidate into one area, a new category on my Blog aptly named “Clever Design”. A place where I can put ideas, images, marketing, TV ads etc that have had a positive effect on me anyway. Stay tuned!

Website of the Day – Slideshare

I came across an interesting site while reading World’s Best Presentation Contest Winners Announced by Guy Kawasaki called SlideShare.

It’s a happy medium between the bulk of image sites like Flickr and Yahoo Photos and video sites like Revver and YouTube where you can easily add Text to what you are wanting to say in a Slide Show. Interestingly enough, like most Web 2.0 Communities people will come up with ideas you never considered, for example check out Evangeline Lilly where this is effectively a portfolio photo shoot of an actress. Clever.

Everything fails, Monitor Everything

From the recent MySQL Conference a number of things resonate strongly almost daily with me. These included:

  1. Guy Kawasaki – Don’t let the bozos grind you down.
    • Boy, the bozos have ground me down this week. I slept for 16 hrs today, the first day of solid rest in 3 weeks.
  2. Paul Tuckfield – YouTube and his various caching tip insights.
    • I’ve seen the promising results of Paul Tuckfield’s comment of pre-fetching for Replication written recently by Farhan.
  3. Ramus – SSL is not secure — This still really scares me.
    • How do I tell rather computer illiterate friends about running multiple browsers, clearing caches, never visiting SSL sites after other sites that are insecure etc.
  4. Everything fails, Monitor Everything – Google

What I’ve been working on most briefly lately, and really want to be far more prepared everywhere I go is Monitor Everything.

It’s so easy on site to just do a vmstat 1 in one session and a mysqladmin -r -i extended-status | grep -v ” | 0 “ in another, and you may observe a trend, make some notes, say 25% CPU, 3000 Selects, 4000 Insert/Updates per second etc, but the problem is, the next day you don’t have actual figures to compare. What was the table_lock_waits yesterday, they seem high today.

I also only found a problem on a site when I graphed the results. I’ll give you a specific example. The average CPU for the system was 55%, the target was 50%. When graphing the CPU, it was plainly obvious something was not right. I could see with extremely regularity (and count 12 in one hour) a huge CPU spike for a second or two. It was so regular in the graph it was not possible it was random. So, after further investigation and testing, a 5 minute job on this production server (and not on previous testing servers) took 25% CPU for a second or two, and a huge amount of Page Faults. Did it effect the overall impact of the performance of the system. I don’t know, but it was a significant anonomoly that required investigation.

So, quite simply, always monitor and record so you can later reference, even if you don’t process the raw figures at the time. The question is then, “What do I monitor”. Answer, monitor everything.

The problem is with most monitoring, e.g. vmstat and mysqladmin is the lack of a timestamp for easy comparison. It’s really, really annoying that you can add this to the line output. The simple solution is to segment your data into both manageable chunks and consistent chunks.

Manageable chucks can be as easy as creating log files hourly, ensuring the start exactly at the top of the hour. Use a YYYYMMDD.HHMMSS naming standard for all files and you can never go wrong.
Consistent chunks is to ensure you start all manageable monitor (e.g. hourly) at the exact same time, so you can compare.

You need to monitor at least the following:

  • vmstat
  • mysqladmin extended-status
  • mysqladmin processlist
  • mysqladmin variables
  • mysqladmin -r -i [n] extended-status | grep -v ” | 0 “

I haven’t found an appropriate network monitoring, but you should also at that.

The issue here is frequency. Here are some guidelines. vmstat every 5 seconds. extended-status and processlist every 30 seconds, variables every hour, and extended-status differences is difficult, but it saves a lot of number crunching later for quick analysis. I do it every second, but not all the time, you need to work out a trigger to enable, or to say run it for 30 seconds every 15-30 minutes.

So in one hour I could have:

  • 20070519.160000.os.vmstat.log
  • 20070519.160000.mysql.variables.log
  • 20070519.160000.mysql.status.log
  • 20070519.160000.mysql.processlist.log
  • 20070519.160500.mysql.status.increment.log, then 1610, 1620, 1630 etc

I have my own scripts for monitoring under development, and I’ve been revising slowly, particularly to be able to load data into a MySQL database so I can easily use SQL for analysis. One thing I actually do is parse files into CSV for easy loading.

There are two tools out there that I’m reviewing and you should look at. Mark Leith has written a Aggregated SHOW STATUS stat pack, and there is also tool called mysqlreport. These both go some what to ultimately what I want.

I haven’t used it yet, but I’ve seen and been very impressed with the simplicity of Munin for graphing. I really need to get some free time to get this operational.

So Monitor Everything and Graph Everything. Plenty of work to do here.

Reading the right MySQL Manual

I learned an extremely valuable lesson today on a client site. It’s important that users of MySQL read the right version of the manual for the product they are using. It’s very easy to just goto http://dev.mysql.com/doc/ which is what I type in directly and browser the manual. While the MySQL Manual has separate sections for 4.x, 5.0, 5.1 etc, the 5.0 Manual for example reflects the most current version of MySQL 5.0. You may not be running the most current version, infact most production systems rarely run the current version.

My specific case was with Connector/J (JDBC) Reference of 5.0.4. The manual pages reflects the new 5.0.5 or todays’ 5.0.6 release and a particular default is now a different value. With Connector/J the docs are bundled with the version. The MySQL Community Server product does not bundle the manual, and I don’t know where to view instances of the MySQL manual for each specific dot release!

The MySQL Conference recap

I recently had the opportunity to return and speak at the Brisbane MySQL Users Group. I spent some time talking about MySQL User Conference 2007 Summary and Life as a Consultant. My summary of included:

  • Overview
  • Keynotes
  • Marten Mickos – MySQL
  • Guy Kawasaki
  • Michael Evans – OLPC
  • Rasmus Lerdorf – PHP
  • Paul Tuckfield – YouTube
  • Community Awards
  • Product Road Map
  • Google
  • Storage Engines
  • Dorsal Source
  • What’s Next

One question was posed to me. “What new did MySQL do this year?” being from the last User Conference. MySQL did seem to not make a great impact at the conference over the successes of the previous year. I had to think some time to come up with the following list.

And most recently:

  • Open Source Database Vendor Partners with LINBIT to Jointly Promote & Support DRBD for MySQL Enterprise Read More
  • IBM DB2 as a Certified Storage Engine for MySQL on System i Read More

It’s hard to say if these are big ticket items or not, but it is definitely disappointing that 5.1 GA is still MIA. We stay tuned.

I also managed a much better response then from my Conference Presentation opening Slide.

“How can you tell an Oracle DBA has touched your MySQL Installation?”

mysqld_safe –user=oracle &