SHOW PROFILE

I’ve been playing more with the SHOW PROFILE command available as part of MySQL Community 5.0.37. Thanks to the author Jeremy Cole. This command can provide some interesting insight into the workings of MySQL. It does however like most new releases of software enable users to make suggestions for new features and functionality. Here is my wish list. Some I figure are practical, some are wish list, but if you never ask you never know.

  1. The Unit of measure for duration is Second. It would be great if it could be tuned for display, say millisecond. If you look at my first Example, all figures are effectively represented in milli-second or even micro-second granularity.
  2. I would like to see a total for the query. Again in Example 1, you have to add up all the figures to determine this query took 8ms.
  3. Again in Example 1, perhaps a percentage of total time for each line duration may be helpful.
  4. More descriptive status descriptions (this is part of the MySQL internal code and not the patch)
  5. SET PROFILING=1; can only be set on a current session, making it impossible to easily monitor a JDBC multi-connection test. There needs to be a way to enable for a session, other then current interactive one you are viewing, but also be able to see results. You can’t do a SHOW PROFILE via a JDBC connection!
  6. I’d like to see a threshold, so queries running under threshold are discarded, much like a long-query-time option. This enables you to run a large number of SQL Statements and only profiles for longer running ones are logged
  7. I’d like to see a level of logging to file, again like a slow query log, so you can simply gather information on a working system and review at some later time. Combined with the previous point, you now have microsecond slow query log with explicit details.

One major benefit of the SHOW PROFILE command is I can accurately get a figure for how long a query it taking (time in milliseconds). You just have to sum all figures (See wish list point 2).

By default, the source details are not provided, you need to specify the SOURCE operand, which helps in both comparing with any debugging output and also trouncing through the code. As in Example 1, I needed to find why 95% of time was in a step with the most descriptive line of ‘end’.

Example 1

mysql> show profile SOURCE,MEMORY for query 4;
+--------------------+------------+-----------------------+---------------+-------------+
| Status             | Duration   | Source_function       | Source_file   | Source_line |
+--------------------+------------+-----------------------+---------------+-------------+
| Opening tables     | 0.00013200 | open_tables           | sql_base.cc   |        2106 |
| System lock        | 0.00001800 | mysql_lock_tables     | lock.cc       |         153 |
| Table lock         | 0.00000600 | mysql_lock_tables     | lock.cc       |         162 |
| init               | 0.00001300 | mysql_select          | sql_select.cc |        2073 |
| optimizing         | 0.00004800 | optimize              | sql_select.cc |         617 |
| statistics         | 0.00002500 | optimize              | sql_select.cc |         773 |
| preparing          | 0.00005200 | optimize              | sql_select.cc |         783 |
| executing          | 0.00002200 | exec                  | sql_select.cc |        1407 |
| Sending data       | 0.00000500 | exec                  | sql_select.cc |        1925 |
| end                | 0.00786600 | mysql_select          | sql_select.cc |        2118 |
| query end          | 0.00001400 | mysql_execute_command | sql_parse.cc  |        5085 |
| freeing items      | 0.00000700 | mysql_parse           | sql_parse.cc  |        5973 |
| closing tables     | 0.00001900 | dispatch_command      | sql_parse.cc  |        2120 |
| logging slow query | 0.00001000 | log_slow_statement    | sql_parse.cc  |        2178 |
| cleaning up        | 0.00000500 | dispatch_command      | sql_parse.cc  |        2143 |
+--------------------+------------+-----------------------+---------------+-------------+
15 rows in set (0.01 sec)

Example 2

We were experiencing increased latency in JDBC with a particular query, With a recommendation from MySQL Support we tried SET SQL_BUFFER_RESULT=1;

mysql> show profile source for query 14;
+------------------------------+------------+-------------------------+---------------+-------------+
| Status                       | Duration   | Source_function         | Source_file   | Source_line |
+------------------------------+------------+-------------------------+---------------+-------------+
| Opening tables               | 0.00006025 | open_tables             | sql_base.cc   |        2106 |
| System lock                  | 0.00004875 | mysql_lock_tables       | lock.cc       |         153 |
| Table lock                   | 0.00000400 | mysql_lock_tables       | lock.cc       |         162 |
| init                         | 0.00001600 | mysql_select            | sql_select.cc |        2073 |
| optimizing                   | 0.00005675 | optimize                | sql_select.cc |         617 |
| statistics                   | 0.00001250 | optimize                | sql_select.cc |         773 |
| preparing                    | 0.00005175 | optimize                | sql_select.cc |         783 |
| Creating tmp table           | 0.00001275 | optimize                | sql_select.cc |        1206 |
| executing                    | 0.00006025 | exec                    | sql_select.cc |        1407 |
| Copying to tmp table         | 0.00000400 | exec                    | sql_select.cc |        1547 |
| converting HEAP to MyISAM    | 0.04820900 | create_myisam_from_heap | sql_select.cc |        9914 |
| Copying to tmp table on disk | 0.04049075 | create_myisam_from_heap | sql_select.cc |        9968 |
| Sending data                 | 1.29302000 | exec                    | sql_select.cc |        1925 |
| end                          | 0.09398425 | mysql_select            | sql_select.cc |        2118 |
| removing tmp table           | 0.00004975 | free_tmp_table          | sql_select.cc |        9856 |
| end                          | 0.00089125 | free_tmp_table          | sql_select.cc |        9884 |
| query end                    | 0.00001850 | mysql_execute_command   | sql_parse.cc  |        5085 |
| freeing items                | 0.00000825 | mysql_parse             | sql_parse.cc  |        5973 |
| closing tables               | 0.00003425 | dispatch_command        | sql_parse.cc  |        2120 |
| logging slow query           | 0.00001325 | log_slow_statement      | sql_parse.cc  |        2178 |
| cleaning up                  | 0.00000675 | dispatch_command        | sql_parse.cc  |        2143 |
+------------------------------+------------+-------------------------+---------------+-------------+
21 rows in set (0.00 sec)

Looking at the lines helped to indicate that the temporary table was being flushed to disk, indicating we need to Add SET SESSION tmp_table_size=20*1024*1024;

mysql> show profile source for query 18;
+----------------------+------------+-----------------------+---------------+-------------+
| Status               | Duration   | Source_function       | Source_file   | Source_line |
+----------------------+------------+-----------------------+---------------+-------------+
| Opening tables       | 0.00006050 | open_tables           | sql_base.cc   |        2106 |
| System lock          | 0.00001250 | mysql_lock_tables     | lock.cc       |         153 |
| Table lock           | 0.00000400 | mysql_lock_tables     | lock.cc       |         162 |
| init                 | 0.00000775 | mysql_select          | sql_select.cc |        2073 |
| optimizing           | 0.00005475 | optimize              | sql_select.cc |         617 |
| statistics           | 0.00001225 | optimize              | sql_select.cc |         773 |
| preparing            | 0.00005075 | optimize              | sql_select.cc |         783 |
| Creating tmp table   | 0.00001350 | optimize              | sql_select.cc |        1206 |
| executing            | 0.00006125 | exec                  | sql_select.cc |        1407 |
| Copying to tmp table | 0.00000375 | exec                  | sql_select.cc |        1547 |
| Sending data         | 0.29110925 | exec                  | sql_select.cc |        1925 |
| end                  | 0.08023800 | mysql_select          | sql_select.cc |        2118 |
| removing tmp table   | 0.00001525 | free_tmp_table        | sql_select.cc |        9856 |
| end                  | 0.05971400 | free_tmp_table        | sql_select.cc |        9884 |
| query end            | 0.00001925 | mysql_execute_command | sql_parse.cc  |        5085 |
| freeing items        | 0.00000425 | mysql_parse           | sql_parse.cc  |        5973 |
| closing tables       | 0.00004625 | dispatch_command      | sql_parse.cc  |        2120 |
| logging slow query   | 0.00000800 | log_slow_statement    | sql_parse.cc  |        2178 |
| cleaning up          | 0.00000300 | dispatch_command      | sql_parse.cc  |        2143 |
+----------------------+------------+-----------------------+---------------+-------------+
19 rows in set (0.00 sec)

Google's worse nightmare

Today while waiting at the airport, I took a look at the news stand, and right there on the cover of Fast Company were two words Google, and Wikipedia. Given Wikipedia is a poster boy of MySQL it was an immediate purchase just to see what was being said.

So the title of the cover was Google’s worse nightmare – Wikipedia’s Jimmy Wales has his sights set on the search business.

Now, often I use Wikipedia to search for things directly rather then using Google. I’ve found it usually to be more accurate, particular on topics I know it will contain. References to search users being disappointed, Google and Yahoo tied with a 2.3 of 5 in user satisfaction hits about home for me as week, and that’s exactly the ideas behind www.wikia.com.

The article states “Google doesn’t have an effective method of locking in its customers the way earlier info-tech leaders did”. Well I’d agree in relation to searching, but Google is rapidly releasing new products, new features and acquiring good ideas. It has the wealth (money and gene pool) to make an impact.

Some more on www.wikia.comWikia offers free MediaWiki hosting for your community to build a free content wiki-based website. Find out why you should use Wikia and request a new wiki in any language or see our list of wikis to visit an existing one.

Today, it has 69 categories, I’m sure tomorrow it will be more. today’s feature page is The Marvel Database, others like Wookieepedia – The Star Wars encyclopedia and 100’s of entries alone in the Games Category make it something to look at one of day’s when I have free time.

And best of all it runs the LAMP Stack. Linux, Apache, MySQL, PHP.

Get behind a new exciting site

As I write this blog I have over 90 draft blog posts. That’s 9-0. Why do I have so many posts? The main reason is I want to say something, and I’ve either not completed it, or researched it sufficiently to consider the entry complete.

This frustrates me as sometimes I just want to get the word out on something, or of my opinion, or of something great I’ve discovered. I do it for me, I don’t really care if anybody actually reads my stuff, but I’m surprised sometimes when I get comments how people actually get to see my blog.

JotThat is a surprisingly simple yet brilliant idea. It’s quite simply a site for making Jots, making quick notes, making a passing comment, noting a thought, something you want to either remember or something you want to say in a simple Jot form.

What makes JotThat in my eyes? Well it’s simple, and I strive for simple. It’s extremely easy to use. It’s built on MySQL, so that gains my attention. It’s something just a little different from everything you see each day.

I especially like the new feature of being able to email your Jots to your account. I was at the train station tonight, then later at dinner with some great input from somebody. I wanted to write it down (I forget things a lot more these days), so instead of adding it to notes in my PDA, I simply emailed it to my JotThat account. Easy as!

The site is still in early stages, but please get behind a project. I always like to get referrals from people on everything from websites, products, even services like a good dentist. I recommend you check this one out, as well as supporting the MySQL open source community.

MySQL LOAD DATA Trick

I leaned a new trick today with LOAD DATA INFILE. I’m migrating some data from an external source, and the Date Format is not the MySQL required YYYY-MM-DD, it was DD-MMM-YY. So how do you load this into a Date Field.


$ echo "02-FEB-07" > /tmp/t1.psv
$ mysql -umysql
USE test;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1(d1 DATE);
# echo "02-FEB-07" > /tmp/t1.psv
LOAD DATA LOCAL INFILE '/tmp/t1.psv'
INTO TABLE t1 (@var1)
SET d1=STR_TO_DATE(@var1,'%d-%M-%y');
SELECT * FROM t1;
EXIT

The trick is to bind the appropriate column within the file being loaded to a variable, @var1 in my example and use the SET syntax to perform a function on the variable. Rather cool.

A good tip to know.

That missing INNODB STATUS

On Thursday I saw something I’d not seen before. An Empty Innodb Status. Now given the amount of output normally shown it was certainly a first. And it looked like:

mysql> SHOW ENGINE INNODB STATUS;
+--------+------+--------+
| Type   | Name | Status |
+--------+------+--------+
| InnoDB |      |        |
+--------+------+--------+
1 row in set (0.03 sec)

To answer some of the most obvious questions.

  • Yes it was a working existing MySQL instance, with InnoDB correctly configured. Indeed we had been benchmarking for several hours.
  • MySQL Server was running, indeed a command selecting data from the mysql schema worked just fine after seeing this (All other tables were Innodb).
  • Absolutely nothing in the host MySQL error log. (This was the second most disappointing aspect)
  • The Process List showed two queries that had been running for some time, everything was taking  ; 1 second. (This was the most disappointing)

So the problem is, MySQL seems to effectively hung when dealing with queries solely in InnoDB tables. Closer investigation found that another application process had filled the /tmp file system. Reclaiming space didn’t cause MySQL and InnoDB to start operating. Even a shutdown of MySQL failed, with mysqld having to be killed manually

For those super inquisitive the version was 5.1.16-ndb-6.2.0-log, and yes it is a Cluster release. I’ve yet to test the problem on a normal 5.1 version and log a bug appropriately if it exists.

I suspect in our benchmark we definitely need to include some timeout handling, so the queries would fail (they were both UPDATES), but it did have the customer asking why, do which there was no answer.

Watching Replication in action

For all those instant GUI people out there, there is an easy way to watch the present status of your MySQL Slaves using the watch command.

$ watch -n 1 -d "mysql -uroot -pxxxx mysql -e 'SHOW SLAVE STATUS\G'"

The watch provides a view of a file or command, and shows interval updates to this output (-n  seconds> option). You can also specific a granularity better then one second for example 0.5. -d also highlights the differences for you. So while you see the following output with your SHOW SLAVE STATUS, on a loaded system you will also see bin-log and relay-log changes, and perhaps Seconds_Behind_Master.

The question is, Why is Seconds_Behind_Master the last column in this display?


*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: localhost
Master_User: repl
Master_Port: 10002
Connect_Retry: 60
Master_Log_File: master-bin.000006
Read_Master_Log_Pos: 102
Relay_Log_File: newyork-relay-bin.000055
Relay_Log_Pos: 244
Relay_Master_Log_File: master-bin.000006
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 102
Relay_Log_Space: 539
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0

Smarter indexing for column LIKE '%string%'

With my very heavy travel load and skilling load I’ve not had time to scratch myself. It hasn’t stopped the brain working overtime on various issues including the classic find a pattern in a string starting with a wildcard character. On a recent gig I saw the true classic.


SELECT columns
FROM users
WHERE username LIKE '%str%'
OR firstname LIKE '%str%'
OR lastname LIKE '%str%'

I went through the various options and comments on leading ‘%’, OR’s, combined columns, FULLTEXT (which doesn’t work in this case), merge indexing etc, however it perplexed me that nobody has really solved this problem, or at least shared their solutions.

I have an idea, a theory and while I’d love to prove/disprove it, I simply just don’t have the time. So here are my notes, hopefully somebody can comment positively/negatively, do some research, or encourage me to pursue it more.

The Problem

The problem is quite simply the leading wildcard, Der!. So how do you eliminate the wildcard from the search?

My Idea

Working with the earlier example, and having a concatenated field of the three components (username,firstname,lastname), my idea is to encode into 27 bits, a bit for each alphabetic character (A-Z) found, and any non-alphabetic character except whitespace in bit 27.

This column is then indexed and searched for bitwise matches. Here is a pictorial description.

String a b c d e f g h i j k l m n o p q r s t u v w x y z
Ronald Bradford 1 1   1   1           1   1 1     1                
Search: ‘brad’ 1 1   1                           1                 *** MATCH ***
Search: ‘fred’       1 1 1                       1                 NO MATCH

My idea is very simple, the question is, will it actually work.

The Questions

The goal is quite obviously to get an Index utilization, to a maximum say of 10%-25% of matching rows. Factors that would affect this include:

  • If search string is one character, like a vowel I could this not useful, so an implied criteria for optimal work is at least 2 or 3 characters.
  • I see the effectiveness lost on large column values, it could be that only up to say 30 characters is optimal, I see strings of 100+ characters would be ineffective.
  • It doesn’t support case-sensitive searching or non ASCII searching

The Tests

  • Create good sized table and distribution of first/last names (A IMDB Actors table would work)
  • Create a concatenated column of searchable fields (e.g. details = CONCAT_WS(‘ ‘,firstname,lastname)
  • Create function to return bitwised search string (e.g. bitize(str))
  • Create Indexed bitwise column and pre-populate accordingly (e.g. bitfulldetails)
  • Create BEFORE INSERT|BEFORE UPDATE Triggers to populate Indexed bitwised column
  • Test the sucker (of course you will need to include the actual LIKE command as well in the WHERE clause)

A sample query would then be:


SELECT columns
FROM users
WHERE bitize('str') & bitfulldetails
AND fulldetails like '%str%'

So the challenge to all those budding MySQL Gurus, does it seem plausible?

Using Innodb Primary Keys wisely

At a customer site recently I came across the following table definition for an Innodb table. 140 columns, 3 part primary key totally 44 bytes, 2 million rows giving over 900MB in data size. And it had 15 indexes, totaling over 2.3GB in size. Add into the mix a Windows Server 2003 OS, 640MB innodb_buffer_pool_size setting and table scans out the warzoo. This all leads to very poor performance.

In is generally considered best practice for Innodb to use the shortest primary key as possible, and there is a clear reason for this. Innodb stores with every index the full primary key. So for example if an indexed column was 4 bytes in length, in this example the index row would be 48 bytes (before overhead). Fortunately an easy solution presented itself, because of this index storage requirement, Innodb will create an internal 6 byte primary key if none exists for a table. I of course had known about this but had never tried it in a production situation. I come from old school where every table is defined with a primary key.

So a ALTER TABLE [name] DROP PRIMARY KEY results in a long wait, and a reduction in the Index size to 900MB. WOOT! Now, in closer analysis the Primary Key is the Primary Key because it’s the Unique requirement for the table. No problem I just add a Unique Key to replace the previously dropped Primary Key. A check to review the Index Size showed a size of 2.3GB. What the!

It seems if you read the fine print of the MySQL documentation for Innodb Table Structures there is an exception clause. If you do not define a PRIMARY KEY for your table, MySQL picks the first UNIQUE index that has only NOT NULL columns as the primary key and InnoDB uses it as the clustered index.

In my situation, by adding a Unique Key this was in turn converted internally to the new Primary Key. Drat! So to the Innodb developers out there. I’d like to see a way for the internally generated key to remain in this situation, or at least provide the ability for the designer to choose this capability.

The only result is to physically create an INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY column definition. This may seem like not a big thing, but the customer did not want to make any schema changes. However it was a necessity simply as the first step to get reasonable performance.

This was only the first part of the problem, solving the full table scans via creative Indexing was not possible, code changes were out of the question for an immediate fix. The real problem was the size of the innodb_buffer_pool_size. It was just too small to handle both the table data and index pages in memory. Perfmon analysis showed the machine simply when into intensive disk I/O every time any queries were run. On a Linux system with just a database and Innodb tables, it can be recommended that 70-80% of available memory can be allocated to the innodb_buffer_pool_size.

Alas this is Windows 32 bit, and their is an implied 2GB memory limit for any process, so the best one could manage in this situation was 1600MB.
The long and the short was that even with poor database design, and immediate performance improvement occurred with an optimal Primary Key and sufficient memory allocated to the correct MySQL buffers. This is only a temporary fix for a greater problem.

MySQL Camp T-Shirts


For those that attended the MySQL Camp at Google HQ late last year you may have seen me with my own T-Shirt designs. A number of people inquired about getting them. I’ve finally got around to make them available online, so anybody that would like one can order online.

There are two different shirts. If you want your name on the shirt, you need to make sure you choose the correct one.

  • Early Adopters – For those that were the first 48 that signed up, your name as well as position and company are on the shirt.
  • The Herd – For everybody that registered on the website, your name is on the shirt.

Ok. I’ve already been asked why 48. This was the number of registrants when I got the shirt made back in Australia a week or so before the Camp.

There are also plenty more of my MySQL designs at my MySQL Wear Store.

For those that also liked the runner up pin “A mouse is a waste of a perfectly good hand”, you can also get this in it’s original graphical shirt design at Geek Cool – CLI”

MySQL Predictions for 2007

I’m interested to know what people consider will behold MySQL in 2007?

The announcement of “You” as Time person of the year can only considered a huge boost to the opportunities in 2007. So, in 2007 here are my 7 (in no significant order).

  1. 2007 will be the year of the storage engine. We will see 5 offerings for transactional storage engines, 20+ available and practical engines for management of some form of data.
  2. 2007 will see MySQL 5.1 GA (finally).
  3. 2007 will see MySQL release it’s own Falcon Storage Engine (GA not until Q4 :-().
  4. The MySQL Winter of Code will enable the contributions of the community to change feature development. I foresee a Bounty system from an external party or parties for MySQL Features emerging.
  5. MySQL will make major press inroads to the RDBMS Big 3 of Oracle, SQL Server and IBM DB2.
  6. Despite efforts of MySQL AB, major installations of MySQL 4.0 and 4.1 including large ISP’s will hamper the uptake of 5.0 and 5.1 and the de-commissioning of 4.x
  7. A major country government will make an announcement to move to Open Source across servers and desktops, and MySQL will contribute to being an enterprise database offering in systems replacements as part of a longer term strategy.

Pluggable Storage Engines – What is the potential?

I started this post a month ago, but after Kaj’s discussion on the same topic at the MySQL Camp I figured it was time to post.

I had dinner with a friend recently (a very smart friend), and our conversation lead him to ask “What’s different with MySQL?”. One of the things I tried to describe was the “Pluggable Storage Engine Architecture” (PSE) potential for the future that I expect will set MySQL apart from all other Open Source and even commercial databases.

Here are some details of the example I tried to provide, given somebody who understands enough of the general principles of RDBMS’s.

Consider the ability that information (intelligent data) is available within a Relational Database via the appropriate tools and language (e.g. SQL) but it is not physically constrained to Tables, Columns and Rows of data and an application to manage that data which is the present traditional approach. Let’s use images that you take with your digital camera as an example.

In a typical RDBMS application you would create an application to managed the content of your data, with a number of tables, and links to the images etc. Of course you would need an application as well to both view and manage this information.

What if, you simply pointed your database to a directory of images and then was able to query information such as photos by date, or by size, by album, from a certain location, with given keyword etc. Most of this information about digital photographs is already there. This information is encoded into an Exif format that is embedded within JPEG images.

So what’s missing from this information? Tags and Comments are the most obvious, because this information can’t be determined electronically, this is something that humans do. If you could also embedded this information into an image with a suitable tool then you could be ready to manage your photos.

A further extension would be to have Image Analysis capabilities that enabled you to search for photos that contained the sky, or people, or something that was the color red.

What if in the future, your camera’s had a built in GPS and this information recorded within Exif, and then the ability to extend your output to link to popular on line mapping software such as Google Maps would exist. You could then use your digital camera to track your moves, taking photos that could then plot your path over a holiday, and also enabling location based queries.

It was interesting to postulate what ideas may be possible in the futre. I suspect that it won’t be long before we actually see this. So what are the other potentials that you may not consider. Another example may be a MP3 Dukebox style PSE, managing all the information held with ID tags of MP3 allowing you to do with music what could be done with images.

References

Exif Example

Here is some example content of Exif using EXIF Tool

 ./exiftool ~/Desktop/2006_02_23_AirShow/IMG_5966.JPG
ExifTool Version Number         : 6.50
File Name                       : IMG_5966.JPG
Directory                       : /home/rbradfor/Desktop/2006_02_23_AirShow
File Size                       : 2 MB
File Modification Date/Time     : 2006:09:24 17:44:32
File Type                       : JPEG
MIME Type                       : image/jpeg
Make                            : Canon
Camera Model Name               : Canon EOS 300D DIGITAL
Orientation                     : Horizontal (normal)
X Resolution                    : 180
Y Resolution                    : 180
Resolution Unit                 : inches
Modify Date                     : 2006:02:23 16:01:56
Y Cb Cr Positioning             : Centered
Exposure Time                   : 1/320
F Number                        : 10.0
ISO                             : 200
Exif Version                    : 0221
Date/Time Original              : 2006:02:23 16:01:56
Create Date                     : 2006:02:23 16:01:56
Components Configuration        : YCbCr
Compressed Bits Per Pixel       : 3
Shutter Speed Value             : 1/320
Aperture Value                  : 10.0
Max Aperture Value              : 3.5
Flash                           : No Flash
Focal Length                    : 18.0mm
Macro Mode                      : Unknown (0)
Self-timer                      : 0
Quality                         : Fine
Canon Flash Mode                : Off
Continuous Drive                : Single
Focus Mode                      : AI Focus AF
Canon Image Size                : Large
Easy Mode                       : Manual
Digital Zoom                    : Unknown (-1)
Contrast                        : +1
Saturation                      : +1
Sharpness                       : +1
Camera ISO                      : n/a
Metering Mode                   : Evaluative
Focus Range                     : Not Known
AF Point                        : Manual AF point selection
Canon Exposure Mode             : Program AE
Lens Type                       : Unknown (-1)
Long Focal                      : 55
Short Focal                     : 18
Focal Units                     : 1
Max Aperture                    : 3.6
Min Aperture                    : 22
Flash Activity                  : 0
Flash Bits                      : (none)
Zoom Source Width               : 3072
Zoom Target Width               : 3072
Color Tone                      : Normal
Focal Plane X Size              : 23.22mm
Focal Plane Y Size              : 15.49mm
Auto ISO                        : 100
Base ISO                        : 200
Measured EV                     : 9.00
Target Aperture                 : 10
Target Exposure Time            : 1/318
Exposure Compensation           : 0
White Balance                   : Auto
Slow Shutter                    : None
Shot Number In Continuous Burst : 0
Flash Guide Number              : 0
Flash Exposure Compensation     : 0
Auto Exposure Bracketing        : Off
AEB Bracket Value               : 0
Focus Distance Upper            : -0.01
Focus Distance Lower            : 5.46
Bulb Duration                   : 0
Camera Type                     : EOS Mid-range
Auto Rotate                     : None
ND Filter                       : Unknown (-1)
Self-timer 2                    : 0
Bracket Mode                    : Off
Bracket Value                   : 0
Bracket Shot Number             : 0
Canon Image Type                : IMG:EOS 300D DIGITAL JPEG
Canon Firmware Version          : Firmware Version 1.1.1
Camera Body No.                 : 0930402471
Serial Number Format            : Format 1
File Number                     : 159-5966
Owner's Name                    :
Canon Model ID                  : EOS Digital Rebel / 300D / Kiss Digital
Canon File Length               : 2387078
WB RGGB Levels Auto             : 1726 832 831 948
WB RGGB Levels Daylight         : 0 0 0 0
WB RGGB Levels Shade            : 0 0 0 0
WB RGGB Levels Cloudy           : 0 0 0 0
WB RGGB Levels Tungsten         : 0 0 0 0
WB RGGB Levels Fluorescent      : 0 0 0 0
WB RGGB Levels Flash            : 0 0 0 0
WB RGGB Levels Custom           : 0 0 0 0
WB RGGB Levels Kelvin           : 0 0 0 0
Color Temperature               : 5200
Num AF Points                   : 7
Canon Image Width               : 3072
Canon Image Height              : 2048
Canon Image Width As Shot       : 3072
Canon Image Height As Shot      : 2048
AF Points Used                  : Mid-left
Preview Quality                 : Normal
Preview Image Length            : 278318
Preview Image Width             : 1536
Preview Image Height            : 1024
Preview Image Start             : 2108760
Preview Focal Plane X Resolution: 3443.9
Preview Focal Plane Y Resolution: 3442.0
User Comment                    :
Flashpix Version                : 0100
Color Space                     : sRGB
Exif Image Width                : 3072
Exif Image Length               : 2048
Interoperability Index          : R98 - DCF basic file (sRGB)
Interoperability Version        : 0100
Related Image Width             : 3072
Related Image Length            : 2048
Focal Plane X Resolution        : 3443.946
Focal Plane Y Resolution        : 3442.017
Focal Plane Resolution Unit     : inches
Sensing Method                  : One-chip color area
File Source                     : Digital Camera
Custom Rendered                 : Normal
Exposure Mode                   : Auto
Scene Capture Type              : Standard
Compression                     : JPEG (old-style)
Thumbnail Offset                : 2560
Thumbnail Length                : 7680
Image Width                     : 3072
Image Height                    : 2048
Aperture                        : 10.0
Drive Mode                      : Single-frame shooting
Flash                           : Off
Image Size                      : 3072x2048
Lens                            : 18.0 - 55.0mm
Preview Image                   : (Binary data 278318 bytes, use -b option to extract)
Preview Image Size              : 1536x1024
Scale Factor To 35mm Equivalent : 1.6
Shooting Mode                   : Program AE
Shutter Speed                   : 1/320
Thumbnail Image                 : (Binary data 7680 bytes, use -b option to extract)
WB RGGB Levels                  : 1726 832 831 948
Blue Balance                    : 1.140108
Circle Of Confusion             : 0.019 mm
Focal Length                    : 18.0mm (35mm equivalent: 27.9mm)
Hyperfocal Distance             : 1.67 m
LV                              : 14.0
Lens                            : 18.0 - 55.0mm (35mm equivalent: 27.9 - 85.3mm)
Red Balance                     : 2.075767

The desire for Performance SQL Tips

It seems, people are clammering for a more consolidated help guide for SQL Performance tips.

Jay Pipes at the MySQL Camp ran a session Interactive Top 10 SQL performance Tips. There was plenty of input and discussion, and at the time Sheeri simply typed them into a wiki page for later work.

Well it seems even that rough list is popular at Del.icio.us ranking near the top of the Hot List on the front page. I saw it earlier and it was second or third, but didn’t think of taking a screen shot until now, but it’s still high.

I’d say that we could easily get the Top 10 for up to 10 different categories rather easily. Good luck Jay.

MySQL Replibeertion

MySQL Replibeertion was the last scheduled session on Day 2, but not withstanding there was free beer (a lot of), there was a serious side with a Replication Discussion.

One of the first questions by Jeremy was “Are the any big replication users?” to which Sheeri quickly replied “Are you calling me fat again”.

This was a highly interactive session, here are some of the points from the audience.

Some Uses of Replication

  • Backup
  • Hot standby
  • Scaling
  • Data Warehousing
    • Slaves are larger then your database
  • For no special reason
  • Consolidation of multiple sources
  • Support for multiple indexes

Issues

  • Can break
  • replication lag
  • bi-directional replication is not supported
  • hard to setup replication/initialization upto point to run one command
  • hard to know when the slave is out of sync (working but broken) diagnose
  • does not manage binary logs for you (max-bin-logs )
  • no row level replication (5.1 row based replication, change based replication, good and bad )
  • Serialized execution on the slave
  • Master does not keep track of the slaves (to the master, the slave is just another connection) Jeremy comment “it really really sucks in production systems.”
  • No multi-master replication. A slave can not have more then one master.
  • ring replication. No idea when something breaks what is right.
  • no ability for delayed duplication.
  • no way to get binlogs back. (manually twink the info file)
  • Master doesn’t care what data is on the slave.
  • A replication backup is really only good for the restoration of that machine
  • Default reconnection timeout is way, way to high (default of 1 minute). It should be at most 1 second with exponential fallback. (master-connect-retry), no fallback, no max number of retries, logged in error log every time.

Feature Results (Things replication needs, what you want to see)

  • Delayed Replication
  • Registered Salves in the Master
  • Import Binlog
  • Checksum Table Events (Need ability for table checksum to be added to binlog periodically so it can be checked by the slave.)
  • Global Sequence Number
  • Connect Retry Exponential Backoff
  • Hetrogous Replication (Oracle to MySQL). Golden Gate Software has a commercial offering
  • Command Exclusion List (sql_log_bin=0)
  • Replication filters by data on the slave
  • Show upcoming queries, skip query
  • Multi-Master to one slave
  • piping mysqlbin log commands into the mysql client fails for some character sets.
  • Binlog index capability
  • Checksum of Binary Events to determine a command is valid
  • command line interface in mysqllog so you could go backwards and forwards, then execute commands.

Check out more at Google Code Blog.

MySQL Camp – Introductions & Comments

The great thing about this unconference, is the lack of total formal structure. For now , our first session we are having an open introduction of people, there are at good 60+ people here already, and people rolling in, and it’s great to hear people’s background, and also to bag Jeremy Cole at every opportunity. We have a variety of people from various backgrounds, companies and experience levels.

We are in the Kiev room, with power build into the desks, lots of desk space and full 360% swivel chairs. This is just another example of the company’s clear thinking about it’s requirements.

There have already been some very funny stories, I should have made more earlier notes. Here are some.

Adam Ritter (Proven Scaling ride winner) was the first to bag Oracle, really bold move with Ken Jacobs from Oracle directly behind him, and he had already made his introduction.

Paul Tuckfield of You Tube guys said to Jeremy Cole re his replication talk “I want to make it, but we have already met before.” There have been about 10 bags of Jeremy already, he is giving as good as he is getting. Proven Scaling are sponsoring Beer session tomorrow night. Great stuff Jeremy. He did also ask how many people were planning on coming, given the number of people at the MySQL Camp has tripled in the past few days.

Breaking news. Mark Callaghan from Google, “Is there anybody from You Tube here”, to which the Paul Tuckfield of You Tube identified himself. After a few quick words the Google comment was “Deals Off” which made everybody laugh. That’s been level of good interaction with people. here

Flickr DB dude (his words) Dathan Vance Pattishall said to Ken Jacobs “You work for InnoDB, right”. Ken Jacobs response was “InnoDB works for me.”. Again a lot of laughs.

MySQL Quotes

Frank was on a role with MySQL quotes (it’s 1am here in New York – All that Red Bull & Vodka). Here are some of them:

Let me scale you!

Wanna scale.

Scale me Baby!

Backup Now!

MySQL – DBA Friendly.

MySQL – Use the Attitude.

MySQL. Be Bold!

MySQL. Look Again

MySQL – Coming to a website near you.

One small step for Data, one giant leap for DBA.

Data, we are serious about it.

My Job, My Passion. MySQL.

MySQL. Never Doubt.

MySQL. Scaling made Easy.

MySQL. Scaling all you want.

Got MySQL!

Do it with MySQL.

Scale Yourself.

MySQL or die.

I’ve also done some of my own shirts designs (see small images below), a number I already have on shirts (you can check them out on me at MySQL Camp).

Some other references include MySQL forge Merchandise and Arjen’s suggestions.


Log Buffer #13: a Carnival of the Vanities for DBAs

Unlike fellow author Giuseppe of last week’s Log Buffer #12 I volunteered for the job of this week’s Log Buffer. Lots to say, so little time, so lets get started with Log Buffer #13.

Tom Kyte has been at the DBForum 2006 in Denmark. Apart from the contents of the Forum, his picture and comment “I spied some artifacts from Mogens Oracle Museum, a copy of the Version 3 and Version 4 Oracle” in Dbforum 2006, in the past… was an impressive look back in time. Manuals, what are they? So how old is this? Wikipedia History places Oracle Version 4 at 1984, some 22 years ago. One of the comments to Tom’s entry takes us to Back to the future (Oracle 4.1 VM appliance). The title gives the article’s content away, but worth a view of Oracle history. Good to also see Tom won the 42 Question Quiz on day 1, but what was the question he got wrong?

Ric Smith gives us a window of this month’s upcoming Oracle Open World with some details of Oracle Open World 2006 – Oracle Develop, “a new event tailored for the “geek” in us all. The format will make for a more developer-oriented conference”. Craig Mullins is at the European International DB2 User Group conference being held in Vienna this week (must be the month for RDBMS conferences). Details of his presentation “Change Control for DB2 Access Paths” are at IDUG in Vienna.

Build Your Own Oracle RAC Cluster on Linux – Again references a very detailed article and explanation by Jeffrey Hunter on RAC and shows the benefit by contributions to the OTN Oracle Community. If you’re heading to Oracle Open World this month, Justin Kestelyn mentions details of a similar presentation being held during the “Oracle on Linux Experience” portion of OTN Night on Monday Oct 23.

David Aldridge in his article Linux 2.6 Kernel I/O Schedulers for Oracle Data Warehousing: Part II has received some good responses in his concise and simple Benchmark. I always strive for simplicity in solving problems and this looks like a good simple approach to graphing I/O.

A little off the beaten track is Applying Web 2.0 to the Enterprise by Jonathan Bruce. The reason why I mention this is two fold. Firstly, decisions made by Project Management can have a big effect on the software development process, and this can have a significant effect on the DBAs and System Administrators that support systems. The article also mentions Agile Software Development of which I am a strong proponent. As I have a very detailed database background I’m also wary of some of the “strenghts” mentioned generally with Agile. A topic I’m happy to discuss more at some time.

E A D G B E. I have no idea what that means, you will need to read Ian Thain’s article regarding the Sybase WorkSpace to find out. He publishes some healthy performance improvement throughput figures with his 3 tuning guidelines.

Firebird 2.00 Release Candidate 5 has also been released this week. The news article indicates that this will probably become the release version.

Peter Scott reminds us that with all the technology advances and an existing 8 year old system which includes documentation, things still happen in I hate on call.

Greg Sabino Mullane over at Planet PostgreSQL is keeping abreast of the various open source offerings with his report on Berkeley DB now does MVCC. His comment “Looks like Oracle is actually doing something with their purchase …. Curiously, this comes right at the point when MySQL is dropping the BDB engine from their product.”. Hmmm, interesting observation, however it wasn’t the actual reason why BDB was dropped from MySQL. The actual reason mentioned some time ago can be found at BDB Engine removal.

Are we working in a booming industry? “Overall, Gartner is predicting that the worldwide DBMS market is around $14 billion and will continue to grow by nearly 7% per year”. This comment by Zack Urlocker is from his attendance of the “Gartner Open Source Summit - a very thorough analysis of the impact of open source technology in the database market.” You can read all his comments of the summit in Gartner Mastermind panel and Gartner on Open Source Databases.

Exploring the secrets of intermediate materialization by Adam Machanic revives a trick he had in SQL Server 2000 in improving logical reads when query tuning. This example shows it’s operation in SQL Server 2005.

Peter Zaitsev gives us a quick refresher on his MySQL Performance Blog with What to tune in MySQL Server after installation. A good introduction reference of configurable system variables, particularly for those non-MySQL DBA’s that need to also support a MySQL installation. Mike Kruckenberg also gives us a valuable consolidated reference in his twin articles, Guide to Incompatibilities when Upgrading MySQL to Version 4.1 and Guide to Incompatibilities when Upgrading MySQL to Version 5.0. Essential reading for clearly understanding MySQL database upgrades and possible traps.

MySQL Tools for Microsoft Visual Studio 1.0.1 beta has been released. Enough said. Ok, well for those that want some more detail, I quote from Reggie Burnetta downloadable plug-in for Visual Studio 2005 that allows Windows developers to quickly build MySQL data-driven applications with Visual Studio. With this plug-in, developers will be able to create, modify and manage MySQL database objects with an easy-to-use interface from within the Visual Studio IDE.. If only I used Microsoft I could check it out!

Daniel Schneller highlights one of the problems in a large scale out MySQL implementation in his article MySQL replication timeout trap. Valuable information in a network infrastructure to ensure your slaves are performing optimally.

Normally I’d summarise a worthy article for review, this time I’ve reproduced the concise summary by Jason Gaylord in Preventing SQL Injection Attacks which explains his content. Scott Guthrie just posted some really good stuff about preventing SQL injection attacks. In his blog post he talks about an application that Michael Sutton created to check SQL injection attacks by screening Google search and looking for sites with QueryString, etc. Check out his post for more details: http://weblogs.asp.net/scottgu/archive/2006/09/30/tip_2f00_trick_3a00_-guard-against-sql-injection-attacks.aspx

Of the big 5 or 6 RDBMS products of the past 2 decades, DB2 is the only one that hasn’t crossed my path in some way. Willie Favero writes What’s in a name – The saga continues…, sharing his views on the official name of DB2® Version 9 for z/OS.

Jeremy Cole has been busy in recent months with his new found freedoms in his new venture Proven Scaling. He has released another MySQL Source Patch with On Triggers, Stored Procedures, and Call Stacks. Keep em comin’ Jeremy. And just as I complete this weeks Log Buffer, good mate Jay (the plumber) Pipes has published HOWTO: Making a Corresponding Test Case for your Patch. Very worthy information for all those past, present and future patch writers.

The Data Charmer Giuseppe Maxia gives us the inside goss on his recent vacation interests in Take the MySQL Certification in five steps. Good Advice, I liked Point 5, and the unofficial Point 6. Marcus Popp also points us to New Lists of Certified Candidates online so you can see your name in lights. Reminds me to stop procrastinating and to take the MySQL 5 exams myself. It’s been on the cards for a few months now.

We end this week with one of those feel good stories of something that inspires me. Paul McCullagh has written his own MySQL transactional storage engine. PBXT beta 0.97 has just been released as a Pluggable storage engine for MySQL 5.1. Quoting Paul “PBXT is the first full featured engine to be released in this form.”. This leverages a new feature in the upcoming MySQL 5.1 GA release where developers can use MySQL’s extensible Storage Engine Architecture as a plugin without the need for recompiling with MySQL source. Look out for a lot more opportunities in storing and access different types of data in the future with this feature. [Author Side Note: Compiling MySQL from the latest BK tree may contain code features that are not fully tested (e.g. 2 Oct 2006). It’s best when integrating other patches or plugins to use a known MySQL Source Snapshots, otherwise things may break!]

And with a certain amount of deja vu from last week’s closing Log Buffer #12 comment by Giuseppe, your’s truly will also be joining MySQL. Checkout my If you can’t beat them, join them.

That’s all for lucky #13. Thanks for the opportunity Dave.

If you can't beat them, join them!

Like fellow friends and MySQL’ers before me Morgan, Roland, Giuseppe, Markus and Sean, I’ve joined the MySQL juggernaut on the ride of my life, achieving two of my short/medium term professional goals in one step. Woot!

It says something to me about the company I’m very excited to work for when I knew of all these people before they joined MySQL this year (2006). I’ll also be joining other friends and MySQL people Arjen, Jon, Jay, Colin, Michael Z and I still have a list of friends that I’ve met while being part of the MySQL community.

And as Giuseppe said I’ll be working in a virtual company. Another article I like to tell others about MySQL is MySQL: Workers in 25 countries with no HQ.

I’ll leave you with the MySQL Values from the Company About MySQL AB page.

We want the MySQL server to be:

  • The best and the most used database in the world
  • Available and affordable for all
  • Easy to use
  • Continuously improved while remaining fast and safe
  • Fun to use and improve
  • Free from bugs

MySQL AB and the people of MySQL AB:

  • Subscribe to the Open Source philosophy
  • Aim to be good citizens
  • Prefer partners that share our values and mindset
  • Answer email and give support
  • Are a virtual company, networking with others

Logical Data Modelling (LDM)

Following my User Group Presentation I was asked by fellow MySQLer Kim about Logical Data Modelling (LDM), in relation to Physical Data Modelling.

Well, first the brain had to work overtime to remember when was the last time I worked on a Logical Data Model. The answer to that is 1996 doing R&D work for Oracle Corporation with their CASE repository tool, Oracle Designer, about version 1.3/1.3.2. I’ve learnt in the past 10 years to purge technical stuff from my brain, leading from the capacity in be able to remember in detail data models, data migration and data cleansing issues of projects even after leaving them 3 years eariler.

As Kim pointed out, he thinks physically, actually directly at the SQL level, then working backwards to produce an appropiate physical model. To think logically is to consider the entities and attributes and relations before considering the physical tables, columns and relationships. So how do you program somebody to think logically? In the case with Kim, he is undertaking formal studies after already grasping the concepts of software development. Generally today more people don’t undertake the formal education and we end up with The Hobbist and the Professional Syndrome.

I guess in summary I’d argue why bother. Does anybody still do traditional logical models? Feedback/comments are welcome. (Professionally I believe there is a place for LDM).
However for the purpose of the exercise lets start with a Physical Representation and present a Logical Representation of that to see the differences.



From this you can see a classic Library example, showing a table of Books and a Table of Authors, and an intersection table (3NF) to indicate that a Book can have one to many Authors, and likewise an Author can have one to many Books.

So how would this have been represented in a Logical Model.



Some things to consider in Logical Modelling.

  • Attributes really only require a name, and perhaps just a datatype using number, string, date (Jim Starkey would be happy), but nothing specific like SMALLINT, or VARCHAR(50).
  • An Attributes Mandatory/Optional state may or may not be known
  • Information is not in 3NF. i.e. relationships can be many to many (in our example, there is no concept of the intersection table

I’m going from memory here, so there is probably more points to consider.

So how do you teach this when you are trying to work backwards, when I learnt this back in 1988 (some 18 years ago), I’d never created a database or used SQL so I didn’t have the history. Arjen had a brilliant idea to consider Logical Modelling after the fact. Views. Consider an end user requirement for Reporting, and how you would represent your model better to an end user (effectively de-normalising these views so users don’t have to know about joins).

In this case you would create a view of Books and a View of Authors. Details such as mandatory/optional isn’t important to the end user report (ie, it’s not like it’s needed to be enforced), and specific datatype details again are not that important. The basics to know how to format a number or a date works.

An interesting approach that worked well in our explaining.

The Text Book

So it was interesting to go back to the text book using C.J. Date’s An Introduction to Database Systems to review definitions.

Logically isn’t really referenced, the term used is conceptual. A conceptual definition is “The right way to do database design is to get the logical design right first, without paying any attention whatsoever at that stage to physical – that is, performance – considerations”

Also for example Third Normal Form we get from Section 12.3 First, Second and Third Normal Forms.

Third normal form (very informal definition): A relvar is in 3NF if and only if the monkey attributes (if any) are both:

  • a. Mutually independent
  • b. Irreducibly dependent on the primary key

We explain the terms nonkey attribute and mutually independend (loosely) as follows:

  • A nonkey attribute is any attribute that does not participate in the primary key of the relvar concerned.
  • Two or more attributes are mutually independent if none of them is functionally dependent on any combination of the others. Such independence implies that each attributes can be updated independently of the rest.

Man, no wonder many years of experience and having generally seen most cases, enables me to forget this and not feel like I’ve fogotten something.

The Hobbyist and the Professional

I first coined this term in February 2006 in a paper titled “Overcoming the Challenges of Establishing Service and Support Channels” for the conference “Implementing Open Source for Optimal Business Performance” View Paper.

I must have referenced this several times, time to give this topic it’s due notice. In summary it targets three areas:

  • the lack of appropriate database design in open source projects
  • the lack of coding standards
  • the lack of sound programming principles

Here are the bullet points from the slide in a presentation.

Hobbyist

  • Download-able software and examples
  • Online tutorials
  • Books like Learn in 24 hours/For Dummies

Professional

  • Formal Qualifications
  • Grounding in sound programming practices
  • Understanding of SDLC principles
  • Worked in team environment

Middle Ground Developer

  • Time to skill verses output productivity
  • Depends on environment and requirements

Slow Queries aren't always that bad!

Well, now I have your attention, Slow Queries are bad (unless it’s a single user system and you don’t care). However there are worse things then slow queries in a large enterprise system.

I’ve been asked in recent weeks a number of questions which has brought this topic to discussion, as well as a current implementation I’m undertaking for a client of a purchased product.

High volume repetitive queries can have a worse effect on your system’s performance. Combined with slow queries that take locks, these queries can have an extreme effect on performance and if you don’t know your application, or have the right tools, it can be initially hard to diagonise easily.

This problem is the classic Wasting CPU cycles problem, seen it before, will see it again.

Here’s a classic example for reference. Using the current product that I’m customising and installing (I’ve not be involved in any development), a typical Customer Information System (CIS).

The system when deployed will have 1.1 Million Customers. Each Customer has a Balance. Now the Customer Balance is not recorded against the customer record, it’s calculated every time it’s required. So, no big deal, well yes. The stored procedure to calculate the balance hit’s 4 different tables, and one of these tables is one of the largest in the system (recording all detailed financial transactions). Multiple batch reports alone that work with large sets of customers all require or use the current balance in some means.

There are only a small set of transactions that affect the balance including Invoice Statements, Payments, Adjustments, Credit Control, Write-Offs . These processes by there own nature are either batch or small online transactions. The calculation of the balance can be applied against the Chunk of customers in a batch, or individual Row for online transactions (See The RAT and the CAT).

Even for the paranoid, the re-calculation of the customer balance for all customers in batch is more efficient after hours, yet only customers where some transaction that has been applied since the last time the re-calculation ran is necessary.

MySQL doesn’t be default have any analysis tools to identify and manage these types of queries. Peter Zaitsev in his article Slow Query Log analyzes tools highlights the issue with a good approach by a source code change to long_query_time as well as some additional scripts.

Hopefully MySQL will consider a more improved approach in future releases that doesn’t require patching the source code. Now it’s unlikely that MySQL will do another SHOW STATUS Gotcha and for example change the unit of measure from seconds to milliseconds, but they could make it a float, so you could specify 0.1 for example.

Stories that impress and motivate you

I’ve worked for two Internet startup companies, both around 2 years each, both now long dead. The first was due to eventual lack of new VC funds, the second gross financial managment in the second year (apparently, when we were told there was no money December one year to pay us, the company that made large profits every month for over the first year, then had made losses every month for the past 12 months, but nobody knew about it. There were 5 Directors from 3 countries and nobody knew. Yeah Right!)


I’ve learnt a lot of non IT street smarts in this time. The first startup took the VC route, and after 3 rounds while I wasn’t involved in the process you pick up things. The single biggest tip here is the Bell-Mason Diagnostic. Here a few introduction references worthy of a quick review (One, Two).

When you take any great idea, and then consider the 4 quadrants and 12 axis you realise you really need to make larger circles of professional contacts.

Zac’s article Valley Boys Run MySQL talks about the new breed of Web entrepreneurs and Web 2.0. In particular check out
Valley Boys Digg.com’s Kevin Rose leads a new brat pack of young entrepreneurs
. This is the new wave of success that works without VC.

For me this recent post on the Meebo Blog really impressed me.

365 days ago

Nineteen releases, eleven (fantastic) team members, and 295,321 gummy bears later we see over a million users log into meebo each day. As a coder, all you hope is that your service will be able help people go about their day-to-day lives. Stability, bugs, and good usability are always top of mind.

1,000,000 users of your Web 2.0 application. This impresses and motivates me. What makes it possible to anybody, is you can get a LAMP stack, a live-cd, cheap hosting and you can turn your idea into something real for next to no cost. Of course, I won’t start on the nightmares out there of great ideas that are very poorly designed. At least the underlying stack can support anything you want to achieve, and MySQL is behind these success stories.

One more thing on Meebo, Check out the meebo map!. I’ve been told that Google has something of a similar nature at the Googleplex. Well I can say I’m very keen to see this, and will be 8 weeks time when the First MySQL Camp is held.



MySQL Trigger Features

Sheeri talked a little about MySQL Triggers in One TRIGGER Fact, Optimizer Rewriting Stuff. While it’s great that MySQL 5.0 has Triggers, there are some things that I feel could become features in future releases.

IF EXISTS

One of the beautiful features that MySQL has is IF EXISTS. This ternary operation that if the object exists performs the operation, of not it does nothing works wonders in reviewing logs for errors. One of the problems with Oracle for example, is the requirement to ignore the ORA errors for non-existent objects.

But this functionality doesn’t exist for Triggers? One must wonder why. I’d like to see this.

mysql> DROP TRIGGER IF EXISTS trigger_name;

MySQL Manual DROP TRIGGER

REPLACE

On feature that simplifies the lack of IF EXISTS functionality using Oracle is REPLACE. The syntax is:

oracle> CREATE OR REPLACE TRIGGER trigger_name...

In this case, this functionality effectively eliminates the need for a DROP IF EXISTS, however I’d not like to see this introduced into MySQL especially it’s an Oracle specific syntax and not ANSI SQL.

Multiple Triggers of same type per table

MySQL only allows one Trigger per type per table. That is, there can only be one BEFORE UPDATE for example. While you may ask the question why you would need this functionality. Here is a typical situation.

You use Triggers to perform some level of business functionality, determining values for optimised (denormalised) columns is a good example. So you need to write an appropiate trigger for that piece of functionality.
You also use Triggers to perform database auditing, that is for every insert/update/delete of data, you record a full copy of the change in an audit database. One way to ensure this is consistent across the entire database is to implement via triggers. So you leverage programming functionality to pre-create triggers for all tables to manage this.

The problem with MySQL occurs in that you have to now merge these triggers for tables that require both. If you want to deploy your application into a test environment, you may wish to not deploy your auditing triggers, but now you have this functionality mixed in with business logic.

Multi Type Triggers

Another cool Oracle feature is the capacity to define a trigger for multiple types. For example:

oracle> CREATE OR REPLACE TRIGGER trigger_name
oracle> BEFORE INSERT OR DELETE OR UPDATE ON table_name ...

In this example, with MySQL you would need to create three seperate triggers.

Other features

While not as important and one would need to consider if necessary in MySQL are some other Oracle provided trigger functionality. This includes:

  • WHEN CLAUSE trigger restriction
  • Triggers on DDL Statements such as CREATE, DROP, ALTER
  • Triggers on Database events such as LOGON, LOGOFF, STARTUP, and SHUTDOWN
  • INSTEAD OF Triggers (for Views)
  • STATEMENT based trigger

For more information you can check out the Oracle Documentation on Introduction to Triggers, CREATE TRIGGER, Documentation Search on Triggers

Brisbane Users Group – MySQL Hackfest

Last night we had a number of keen souls at the Brisbane MySQL User Group. I was very impressed to see the majority of people with laptops at hand.

You can download my slides and code examples at my Articles page.

In our hands on Hackfest tutorial we created the new command SHOW USERGROUPS. Before anybody makes a comment, it was stated in the presentation that this command was made a dummy one, and is a poor candidate for two reasons.

  1. The results should be more dynamic, rather then hardcoded into the source tree
  2. USERGROUPS is not an ideal name due to comparisions to Users, Groups, Roles etc

Still it was productive, here was the outcome of our work for the evening.

mysql> SHOW USERGROUPS;
+----------------------+-------------------------+-----------------------------+
| Name                 | Location                | Website                     |
+----------------------+-------------------------+-----------------------------+
| Brisbane Users Group | Brisbane, QLD Australia | http://mysql.meetup.com/84/ |
| Boston Users Group   | Boston, USA             |                             |
+----------------------+-------------------------+-----------------------------+
2 rows in set (0.00 sec)

The trailing slash ‘/’ Hitcho’s contribution :)

mysql> SHOW AUTHORS;
+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
| Name                           | Location                              | Comment                             |
+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
| Brian (Krow) Aker              | Seattle, WA, USA                      | Architecture, archive, federated, bunch of little stuff :)           |
...
 Brisbane Users Group           | Brisbane, QLD Australia               | New Command SHOW USERGROUPS                             |
+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
75 rows in set (0.05 sec)

I’ll be publishing more beginner tutorials in my MySQL Compiling Category over time.

Update Check out http://www.arabx.com.au/hackfest.diff.htm for a diff of files to produce this output.

Compiling MySQL Tutorial 3 – Debugging Output

Continuing on from Tutorial 2.

When reviewing the 2.1. C/C++ Coding Guidelines for MySQL, you will see that the MySQL Source uses within the C/C++ code DBUG (Fred Fish’s debug library). This is one of the 3rd party open source products that are documented in the Internals 1.4. The Open-source Directories.

You will also find some usage in the MySQL Manual E.3 The DBUG Package. So enough talk, how do you use this.

DBUG

You get the DBUG output by running a mysqld debug version with the argument –debug. The output for the SHOW AUTHORS commands is:

T@9190304: >dispatch_command
T@9190304: | >alloc_root
..
T@9190304: | query: SHOW AUTHORS
T@9190304: | >mysql_parse
T@9190304: | | >mysql_init_query
T@9190304: | | | >lex_start
T@9190304: | | | | >alloc_root
T@9190304: | | | | | enter: root: 0x9590b38
T@9190304: | | | | | exit: ptr: 0x95c1c70
T@9190304: | | | | <alloc_root
T@9190304: | | | <lex_start
T@9190304: | | | >mysql_reset_thd_for_next_command
T@9190304: | | | <mysql_reset_thd_for_next_command
T@9190304: | | <mysql_init_query
T@9190304: | | >Query_cache::send_result_to_client
T@9190304: | | <query_cache ::send_result_to_client
T@9190304: | | >mysql_execute_command
T@9190304: | | | >mysqld_show_authors
T@9190304: | | | | >alloc_root
T@9190304: | | | | | enter: root: 0x9590b38
T@9190304: | | | | | exit: ptr: 0x95c1c78
T@9190304: | | | | <alloc_root
...
T@9190304: | | | | >send_fields
T@9190304: | | | | | packet_header: Memory: 0x8c2480  Bytes: (4) 01 00 00 01
T@9190304: | | | | | >alloc_root
T@9190304: | | | | >Protocol::write
T@9190304: | | | | <protocol ::write
...
T@9190304: | | | | >send_eof
T@9190304: | | | | | packet_header: Memory: 0x8c2550  Bytes: (4) 05 00 00 51
T@9190304: | | | | | >net_flush
T@9190304: | | | | | | >vio_is_blocking
T@9190304: | | | | | | | exit: 0
T@9190304: | | | | | | <vio_is_blocking
T@9190304: | | | | | | >net_real_write
T@9190304: | | | | | | | >vio_write
T@9190304: | | | | | | | | enter: sd: 23, buf: 0x0x95b9bd0, size: 5107
T@9190304: | | | | | | | | exit: 5107
T@9190304: | | | | | | | <vio_write
T@9190304: | | | | | | <net_real_write
T@9190304: | | | | | <net_flush
T@9190304: | | | | | info: EOF sent, so no more error sending allowed
T@9190304: | | | | <send_eof
T@9190304: | | | <mysqld_show_authors
T@9190304: | | <mysql_execute_command
T@9190304: | | >query_cache_end_of_result
T@9190304: | | <query_cache_end_of_result
...
T@9190304: | <mysql_parse
T@9190304: | info: query ready
T@9190304: | >log_slow_statement
T@9190304: | <log_slow_statement
T@9190304: >dispatch_command

Using some of the debug options you get the following output with –debug=d,info,error,query,general,where:O,/tmp/mysqld.trace

create_new_thread: info: creating thread 3
create_new_thread: info: Thread created
?func: info: handle_one_connection called by thread 3
?func: info: New connection received on socket (23)
?func: info: Host: localhost
?func: info: vio_read returned -1,  errno: 11
thr_alarm: info: reschedule
process_alarm: info: sig: 14  active alarms: 1
?func: info: client_character_set: 8
check_user: info: Capabilities: 238213  packet_length: 16777216  Host: 'localhost'  Login user: 'rbradfor' Priv_user: ''  Using password: no Access: 0  db: '*none*'
send_ok: info: affected_rows: 0  id: 0  status: 2  warning_count: 0
send_ok: info: OK sent, so no more error sending allowed
do_command: info: vio_read returned -1,  errno: 11
thr_alarm: info: reschedule
process_alarm: info: sig: 14  active alarms: 1
do_command: info: Command on socket (23) = 3 (Query)
dispatch_command: query: SHOW AUTHORS
send_eof: info: EOF sent, so no more error sending allowed
dispatch_command: info: query ready
do_command: info: vio_read returned -1,  errno: 11
thr_alarm: info: reschedule
process_alarm: info: sig: 14  active alarms: 1
do_command: info: Command on socket (23) = 1 (Quit)
~THD(): info: freeing security context
end_thread: info: sending a broadcast
end_thread: info: unlocked thread_count mutex


I have found the following format good for comparing with the MySQL Source Code. —debug=d:N:F:L:O,/tmp/mysqld.trace

   49:   sql_parse.cc:  1536: do_command: info: Command on socket (23) = 3 (Query)
   50:     my_alloc.c:   172: alloc_root: enter: root: 0x9068b38
   51:     my_alloc.c:   219: alloc_root: exit: ptr: 0x9099c30
   52:   safemalloc.c:   128: _mymalloc: enter: Size: 16384
   53:   safemalloc.c:   197: _mymalloc: exit: ptr: 0x909cc80
   54:   safemalloc.c:   262: _myfree: enter: ptr: 0x9095bf8
   55:   sql_parse.cc:  1757: dispatch_command: query: SHOW AUTHORS
..
   88:    sql_show.cc:   385: mysqld_show_authors: packet_header: Memory: 0x940590  Bytes: (4)
..
  166:    protocol.cc:   385: send_eof: packet_header: Memory: 0x940550  Bytes: (4)
05 00 00 51
  167:    viosocket.c:   184: vio_is_blocking: exit: 0
  168:    viosocket.c:   105: vio_write: enter: sd: 23, buf: 0x0x9091bd0, size: 5107
  169:    viosocket.c:   117: vio_write: exit: 5107
  170:    protocol.cc:   342: send_eof: info: EOF sent, so no more error sending allowed
  171:     sql_lex.cc:   199: lex_end: enter: lex: 0x9068b58
  172:   sql_parse.cc:  1796: dispatch_command: info: query ready
  173:     my_alloc.c:   331: free_root: enter: root: 0x9068b38  flags: 1
  174:    viosocket.c:   184: vio_is_blocking: exit: 0
  175:    viosocket.c:    36: vio_read: enter: sd: 23, buf: 0x0x9091bd0, size: 4
  176:    viosocket.c:    49: vio_read: vio_error: Got error 11 during read
  177:    viosocket.c:    52: vio_read: exit: -1
  178:   sql_parse.cc:   809: do_command: info: vio_read returned -1,  errno: 11

  190:   sql_parse.cc:  1536: do_command: info: Command on socket (23) = 1 (Quit)
  191:     my_alloc.c:   331: free_root: enter: root: 0x9068b38  flags: 1
  192:      mysqld.cc:  1681: close_connection: enter: fd: socket (23)  error: ''
  193: sql_handler.cc:   628: mysql_ha_flush: enter: tables: (nil)  mode_flags: 0x02

This will make a lot more sense with my upcoming presentation where you will see clear use and interaction between source files such as sql_parse.cc & sql_show.cc.

For more information check out Making Trace Files

It's makes me cry

I got home today and sat down to read my home email list. Nothing new. But on a MySQL mailing list, there was an enquiry why performance was slowing in a given application. I didn’t even have to read the situation, nor the problem, it took less then the 200ms mentioned to identify the problem looking at the supplied schema.

In summary, the first table in the schema had a primary key of VARBINARY(255) and a engine type of Innodb. Hold on, wait, it’s a concatenated key of two VARBINARY(255) columns. And I should mention, that primary key was a foreign key in the next table. If this was a home website app with one user, ok well it’s still bad, but this application was having performance problems with reasonable volumes of transactions, it’s not a beginner application. (A recent reference If you don’t know your data, you don’t know your application). Where do people learn this from!

Now I want to work on a large scale out MySQL environment, an organisation or site that’s going places, but the world of even normal every day applications is littered with these basic fundamental design errors. It’s make me cry.

My last contract had many flaws across the business, and the Java code for production applications was just as bad. Check out things like What constitutes a good error message to the user?, Why IT professionals get a bad name (I could easily write one of these a day for a long time, or you could check out The Daily WFT). You think the new contract with an large existing multinational company and world-wide installed Java application I could get an improvement. I’ll leave these stories for another time. I don’t want to have to deal with this.

I’ve got 16 half written blogs on cool MySQL features or interesting topics, but I never seem to get the motivation at the end of the day. I just want to know, how do people in IT get away with this. Where is the Pride in people’s work, where is the desire for people to make what they do a better place. Where is:

An opportunity that’s a challenge, including involving hard work and tight time frames, but a job that provides the rewards of job satisfaction for a productive contribution. An importance on quality, emphasis on continued improvement, and goals of simplicity in complex situations while working in a team environment are also necessary.

SHOW STATUS Gotcha

Well, it’s Sunday night so I will put this down to being the weekend. The background to being caught out is a request I made to my local Users Group mailing list for some information on people’s environments because I wanted to some empirical data analysis without having any more knowledge of the systems.

In summary (without the surrounding fan-fare, I was seeking):

SELECT VERSION();
SHOW STATUS;
SHOW VARIABLES;  // Optional

I was however perplexed why my first data point analysis (Read/Write ratio) using the Status values Com_insert, Com_update, Com_delete and Com_select was not always giving me expected results. In particular, a number of server results showed 0 for values while I knew the results came from working MySQL environments.

So, sanity check with good friend Morgan and I get the response to answer the dilemma SHOW STATUS defaults to session based statistics in 5.0+, there’s a million people it’s caught out, and many bugs about “xyz variable not being incremented”.

Well, it’s Sunday night so I will put this down to being the weekend. The background to being caught out is a request I made to my local Users Group mailing list for some information on people’s environments because I wanted to some empirical data analysis without having any more knowledge of the systems.

So I’m really seeking:

SELECT VERSION();
SHOW GLOBAL STATUS;
SHOW GLOBAL VARIABLES;  // Optional

Well, that’s my MySQL trivia point for the day. MySQL between version 4.1 and 5.0 made a major change with the default value. The MySQL 5.0 Manual also states this.

UPDATE 23 Sep 2006

Following Eric’s Show @&!# status again!, I did then read the Bug #19093, and it was great to see Monty comment (that’s open source for you, enabling even founders to mix with the common folk), and I’d tend to agree to his comments. MySQL has attempted to correct something.

Perhaps the best compromise to the problem is to deprecate and drop the SHOW STATUS command, and make the scope component GLOBAL|SESSION compulsory, so SHOW GLOBAL STATUS or SHOW SESSION STATUS are the only valid commands!

The RAT and the CAT

No, it’s not a bedtime story, is a serious system’s design concept and I’m amazing that people don’t know about this.
As I mentioned in If you don’t know your data, you don’t know your application I was doing a Java Code Review, and I found a clear case of a much simplier solution. How simple you ask?

Well, without completing the task 100%, I achieved in less then 1 day (and lets say for the argument 1 more day of work), what is being worked on by somebody else for a week, with an estimate of 2 more weeks to complete. So let’s add 50% to my estimate, that’s a total of 3 days verses 15 days. You do the math. and yes that was last week and that task is still being worked on the same way, even with reference to my working code. Not to mention the code is a similiar magnitude of simplicity, and simplicity means cost savings in support, people so quickly forget that.

So what is this RAT and CAT:

  • RAT – Row At a Time
  • CAT – Chunk At a Time

This concept is really rather simple, however it’s amazing how implementations never consider this. Let me give you a simple example.

Your importing some external data (a flat file of 5 columns, and is inturn will be stored in 2 tables in a normalised form). So the RAT way of doing things would entail:

  • Using the language in question, open the file, read line by line, converting into object for reference.
  • Now for each row read you
    • You get the values (A and B), and then do a select to see if this row already exists in one of the normalised tables (calling in X). If it does, you have the surrogate key, else you insert the row and get the surrogate key
    • Now you do the same with the next set of values (C, D, E) which reference the normalised values (A and B), inserting into Y)
    • And so on and so on.
    • Report exceptions line by line if found

For those now laughing, it’s not really funny, it’s sad that programs are written this way. For those that say, but that’s the only way, well welcome to a different, radical and hard to learn approach.

The Cat way of doing things would entail:

  • Create a temporary table
  • Bulk load the data into the temporary table via appropiate command, e.g. mysqlimport or LOAD DATA.
  • Using one select, insert into X rows from temporary table that are not already present
  • Using one select, insert into Y rows from the temporart table that are not present, joining to X to get the appropiate surrogate key
  • Report exceptions via one select of data that wasn’t inserted due to some rules

It’s not rocket science people.

I should mention this is a simple example, and it’s not always possible to do CAT processing for all operations, but generally some portion of batch work can be, and the remaining must be done in a RAT way.

Updated
Some more recent articles including We need more CATs (2009) and The Art of Elimination (2010)

If you don't know your data, you don't know your application.

The art of data modelling is definitely lost on some [most] people, or they never found it, even though they think they did. Over dinner with good friend Morgan last night we were swapping present stories on the topic.

Morgan wrote recently about I want my 4 bytes back damn it., and interesting example storing an ISBN. Further reference can be found at Getting started with MySQL of a more impractical ISBN example.

Disk is cheap now, so the attitude and poor excuse can be, well a few extra bytes doesn’t matter. Well no! If your a social hacker and have a website with a maximium concurrent connections of 2 maybe, but much like some recent Java Code Reviewing I just performed, just because the system isn’t 24×7, doesn’t give you excuse to be lazy about writing the code not to handle concurrency, thread safety and also as efficient as possible, in this case RAT verses CAT. (I’ll need to write about this, it seemed to go over some of the other professionals even)

I can remember a very specific example some 10 years ago in doing some performance analysis on a site. I’d identified the need for an additional index on a table. Now this table was sized for 200 million rows, and it already had about 70 million. The problem was adding another index required 4GB disk allocation. These things have an effect on sizing, growth and backups.

So the impact on appropiate sizing can clearly have an effect, if it was just one poorly sized column that’s acceptable (just), but normally it’s a pattern that litters a data model.

What’s important to realise is, it’s not just diskspace, it’s also memory. Without really touching on sizing data, I did mention some examples previously in Improving Open Source Databases – WordPress. Here the use of BIGINT(20) for primary keys proved my point. That’s 8 bytes, but unless you have going to have 4 billion categories, it’s a waste. It’s a waste when it’s a foreign key in a table, and it’s a big waste when it’s indexed, and that index is then in memory, and wasting more precious resources.

So how to do identify when the designer of the model has no idea about the intrinsic data value being stored? If you see tables with VARCHAR(255), that’s a clear sign. They have no idea regarding the data, so a default limit is used. Morgan referred to it as “Shooting guns in the dark with your eyes closed”. Books don’t help the cause, I was just skimming High Performance MySQL last night (one of the many freebies from the UC). There on page 82, is a table definition with not one column, but two with varchar(255). Hmmm!

If you see any new applications with VARHAR(255) they are even more lost, because MySQL 5, which has been around quite some time now, supports VARCHAR(65535). Speaking of that, has anybody seen VARCHAR(65535). I’d like to know.

Another example, is in Sheeri’s Top 8 SQL Best Practices Point 4 in regards to storing IP’s effeciently. If you log for example every page hit, this can be your largest table, and moving from varchar(15) to int can save you upto 11 bytes per row alone.

These may just be simple examples, but it’s the principle. When you define a column, consider it’s data, if you don’t know then take the time to do the reasearch and learn.

Pride

I’ve recently completed a contract and I’ve been in discussions with agents and other employers for further work. Having had one of the worse experiences in my previous work, I’ve been extra careful to ensure what I’m told at the interview/meeting stage is indeed true and accurate (in my last case it was not). I’ve also not made the assumption that an organisation that is dependent on software has placed a certain level of value on what’s in place. (in my last case I did, simply due to the size of the organisation and volume of business).

So, when being asked by people what I’m seeking, outside of the technical skills and compendencies, I’m seeking an organisation that places value on it’s existing software, it’s software quality, it’s software improvement and most importantly it’s software developers. It was unfortunate that for an organisation that lived in software, and would not survive long (especially at the present scale of operations) without it, they met none of these criteria. It was really sad, and overall I found the environment “depressing”.

Now “depressing” could be considered a harsh word, but it comes from both my ethos and also present circumstances in life. I had these two post-it notes on my wall at work, which summed up “What I was seeking/Where I wanted to be”, and “Where I was” in my job.

There have been a few articles recently, it was probably Marcus in Supporting ideas and being productive, based on earlier articles of How to Come Up With Ideas and How To Kill Good Ideas by Zack that motivated me to publish this. These articles have many good points, and while I could put my slant on it, I wanted to focus on just one thing.

In Software Development 10 years ago I was driven, now I know and I’m driven by passion. I always strive for something better then their presently is, seeking better quality, and better productivity, and most importantly today, simplicity.

Everybody is at a different position in life and it’s important to find the right fit. For me money is not a motivator, being productive, making what I’m doing better, where I’m working more enjoyable and productive, and who I’m working with a better place to be are important to me.

I’ve choose the word for the moment to best represent this as PRIDE.

Pride is:

  • A sense of one’s own proper dignity or value; self-respect
  • Pleasure or satisfaction taken in an achievement, possession, or association
  • A cause or source of pleasure or satisfaction; the best of a group or class
  • The most successful or thriving condition
  • To indulge (oneself) in a feeling of pleasure or satisfaction

For me, I’m seeking an organisation that takes pride in it’s people, it’s products and it’s future direction, and that attracts people that have pride in what they do, they way they think and how they interact with others.