Watching Replication in action

March 19, 2007 by ronald

For all those instant GUI people out there, there is an easy way to watch the present status of your MySQL Slaves using the watch command.

$ watch -n 1 -d "mysql -uroot -pxxxx mysql -e 'SHOW SLAVE STATUS\G'"

The watch provides a view of a file or command, and shows interval updates to this output (-n seconds> option). You can also specific a granularity better then one second for example 0.5. -d also highlights the differences for you. So while you see the following output with your SHOW SLAVE STATUS, on a loaded system you will also see bin-log and relay-log changes, and perhaps Seconds_Behind_Master.

The question is, Why is Seconds_Behind_Master the last column in this display?

*************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: localhost Master_User: repl Master_Port: 10002 Connect_Retry: 60 Master_Log_File: master-bin.000006 Read_Master_Log_Pos: 102 Relay_Log_File: newyork-relay-bin.000055 Relay_Log_Pos: 244 Relay_Master_Log_File: master-bin.000006 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 102 Relay_Log_Space: 539 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0

Smarter indexing for column LIKE '%string%'

March 7, 2007 by ronald

With my very heavy travel load and skilling load I’ve not had time to scratch myself. It hasn’t stopped the brain working overtime on various issues including the classic find a pattern in a string starting with a wildcard character. On a recent gig I saw the true classic.

SELECT columns FROM users WHERE username LIKE '%str%' OR firstname LIKE '%str%' OR lastname LIKE '%str%'

I went through the various options and comments on leading ‘%’, OR’s, combined columns, FULLTEXT (which doesn’t work in this case), merge indexing etc, however it perplexed me that nobody has really solved this problem, or at least shared their solutions.

I have an idea, a theory and while I’d love to prove/disprove it, I simply just don’t have the time. So here are my notes, hopefully somebody can comment positively/negatively, do some research, or encourage me to pursue it more.

The Problem

The problem is quite simply the leading wildcard, Der!. So how do you eliminate the wildcard from the search?

My Idea

Working with the earlier example, and having a concatenated field of the three components (username,firstname,lastname), my idea is to encode into 27 bits, a bit for each alphabetic character (A-Z) found, and any non-alphabetic character except whitespace in bit 27.

This column is then indexed and searched for bitwise matches. Here is a pictorial description.

String

Ronald Bradford

Search: ‘brad’

*** MATCH ***

Search: ‘fred’

NO MATCH

My idea is very simple, the question is, will it actually work.

The Questions

The goal is quite obviously to get an Index utilization, to a maximum say of 10%-25% of matching rows. Factors that would affect this include:

If search string is one character, like a vowel I could this not useful, so an implied criteria for optimal work is at least 2 or 3 characters.
I see the effectiveness lost on large column values, it could be that only up to say 30 characters is optimal, I see strings of 100+ characters would be ineffective.
It doesn’t support case-sensitive searching or non ASCII searching

The Tests

Create good sized table and distribution of first/last names (A IMDB Actors table would work)
Create a concatenated column of searchable fields (e.g. details = CONCAT_WS(‘ ‘,firstname,lastname)
Create function to return bitwised search string (e.g. bitize(str))
Create Indexed bitwise column and pre-populate accordingly (e.g. bitfulldetails)
Create BEFORE INSERT|BEFORE UPDATE Triggers to populate Indexed bitwised column
Test the sucker (of course you will need to include the actual LIKE command as well in the WHERE clause)

A sample query would then be:

SELECT columns FROM users WHERE bitize('str') & bitfulldetails AND fulldetails like '%str%'

So the challenge to all those budding MySQL Gurus, does it seem plausible?

Using Innodb Primary Keys wisely

February 15, 2007 by ronald

At a customer site recently I came across the following table definition for an Innodb table. 140 columns, 3 part primary key totally 44 bytes, 2 million rows giving over 900MB in data size. And it had 15 indexes, totaling over 2.3GB in size. Add into the mix a Windows Server 2003 OS, 640MB innodb_buffer_pool_size setting and table scans out the warzoo. This all leads to very poor performance.

In is generally considered best practice for Innodb to use the shortest primary key as possible, and there is a clear reason for this. Innodb stores with every index the full primary key. So for example if an indexed column was 4 bytes in length, in this example the index row would be 48 bytes (before overhead). Fortunately an easy solution presented itself, because of this index storage requirement, Innodb will create an internal 6 byte primary key if none exists for a table. I of course had known about this but had never tried it in a production situation. I come from old school where every table is defined with a primary key.

So a ALTER TABLE [name] DROP PRIMARY KEY results in a long wait, and a reduction in the Index size to 900MB. WOOT! Now, in closer analysis the Primary Key is the Primary Key because it’s the Unique requirement for the table. No problem I just add a Unique Key to replace the previously dropped Primary Key. A check to review the Index Size showed a size of 2.3GB. What the!

It seems if you read the fine print of the MySQL documentation for Innodb Table Structures there is an exception clause. If you do not define a PRIMARY KEY for your table, MySQL picks the first UNIQUE index that has only NOT NULL columns as the primary key and InnoDB uses it as the clustered index.

In my situation, by adding a Unique Key this was in turn converted internally to the new Primary Key. Drat! So to the Innodb developers out there. I’d like to see a way for the internally generated key to remain in this situation, or at least provide the ability for the designer to choose this capability.

The only result is to physically create an INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY column definition. This may seem like not a big thing, but the customer did not want to make any schema changes. However it was a necessity simply as the first step to get reasonable performance.

This was only the first part of the problem, solving the full table scans via creative Indexing was not possible, code changes were out of the question for an immediate fix. The real problem was the size of the innodb_buffer_pool_size. It was just too small to handle both the table data and index pages in memory. Perfmon analysis showed the machine simply when into intensive disk I/O every time any queries were run. On a Linux system with just a database and Innodb tables, it can be recommended that 70-80% of available memory can be allocated to the innodb_buffer_pool_size.

Alas this is Windows 32 bit, and their is an implied 2GB memory limit for any process, so the best one could manage in this situation was 1600MB.
The long and the short was that even with poor database design, and immediate performance improvement occurred with an optimal Primary Key and sufficient memory allocated to the correct MySQL buffers. This is only a temporary fix for a greater problem.

MySQL Camp T-Shirts

January 9, 2007 by ronald

For those that attended the MySQL Camp at Google HQ late last year you may have seen me with my own T-Shirt designs. A number of people inquired about getting them. I’ve finally got around to make them available online, so anybody that would like one can order online.

There are two different shirts. If you want your name on the shirt, you need to make sure you choose the correct one.

Early Adopters – For those that were the first 48 that signed up, your name as well as position and company are on the shirt.
The Herd – For everybody that registered on the website, your name is on the shirt.

Ok. I’ve already been asked why 48. This was the number of registrants when I got the shirt made back in Australia a week or so before the Camp.

There are also plenty more of my MySQL designs at my MySQL Wear Store.

For those that also liked the runner up pin “A mouse is a waste of a perfectly good hand”, you can also get this in it’s original graphical shirt design at Geek Cool – CLI”

MySQL Predictions for 2007

January 1, 2007 by ronald

I’m interested to know what people consider will behold MySQL in 2007?

The announcement of “You” as Time person of the year can only considered a huge boost to the opportunities in 2007. So, in 2007 here are my 7 (in no significant order).

2007 will be the year of the storage engine. We will see 5 offerings for transactional storage engines, 20+ available and practical engines for management of some form of data.
2007 will see MySQL 5.1 GA (finally).
2007 will see MySQL release it’s own Falcon Storage Engine (GA not until Q4 :-().
The MySQL Winter of Code will enable the contributions of the community to change feature development. I foresee a Bounty system from an external party or parties for MySQL Features emerging.
MySQL will make major press inroads to the RDBMS Big 3 of Oracle, SQL Server and IBM DB2.
Despite efforts of MySQL AB, major installations of MySQL 4.0 and 4.1 including large ISP’s will hamper the uptake of 5.0 and 5.1 and the de-commissioning of 4.x
A major country government will make an announcement to move to Open Source across servers and desktops, and MySQL will contribute to being an enterprise database offering in systems replacements as part of a longer term strategy.

Pluggable Storage Engines – What is the potential?

November 28, 2006 by ronald

I started this post a month ago, but after Kaj’s discussion on the same topic at the MySQL Camp I figured it was time to post.

I had dinner with a friend recently (a very smart friend), and our conversation lead him to ask “What’s different with MySQL?”. One of the things I tried to describe was the “Pluggable Storage Engine Architecture” (PSE) potential for the future that I expect will set MySQL apart from all other Open Source and even commercial databases.

Here are some details of the example I tried to provide, given somebody who understands enough of the general principles of RDBMS’s.

Consider the ability that information (intelligent data) is available within a Relational Database via the appropriate tools and language (e.g. SQL) but it is not physically constrained to Tables, Columns and Rows of data and an application to manage that data which is the present traditional approach. Let’s use images that you take with your digital camera as an example.

In a typical RDBMS application you would create an application to managed the content of your data, with a number of tables, and links to the images etc. Of course you would need an application as well to both view and manage this information.

What if, you simply pointed your database to a directory of images and then was able to query information such as photos by date, or by size, by album, from a certain location, with given keyword etc. Most of this information about digital photographs is already there. This information is encoded into an Exif format that is embedded within JPEG images.

So what’s missing from this information? Tags and Comments are the most obvious, because this information can’t be determined electronically, this is something that humans do. If you could also embedded this information into an image with a suitable tool then you could be ready to manage your photos.

A further extension would be to have Image Analysis capabilities that enabled you to search for photos that contained the sky, or people, or something that was the color red.

What if in the future, your camera’s had a built in GPS and this information recorded within Exif, and then the ability to extend your output to link to popular on line mapping software such as Google Maps would exist. You could then use your digital camera to track your moves, taking photos that could then plot your path over a holiday, and also enabling location based queries.

It was interesting to postulate what ideas may be possible in the futre. I suspect that it won’t be long before we actually see this. So what are the other potentials that you may not consider. Another example may be a MP3 Dukebox style PSE, managing all the information held with ID tags of MP3 allowing you to do with music what could be done with images.

References

Exif Example

Here is some example content of Exif using EXIF Tool

 ./exiftool ~/Desktop/2006_02_23_AirShow/IMG_5966.JPG
ExifTool Version Number         : 6.50
File Name                       : IMG_5966.JPG
Directory                       : /home/rbradfor/Desktop/2006_02_23_AirShow
File Size                       : 2 MB
File Modification Date/Time     : 2006:09:24 17:44:32
File Type                       : JPEG
MIME Type                       : image/jpeg
Make                            : Canon
Camera Model Name               : Canon EOS 300D DIGITAL
Orientation                     : Horizontal (normal)
X Resolution                    : 180
Y Resolution                    : 180
Resolution Unit                 : inches
Modify Date                     : 2006:02:23 16:01:56
Y Cb Cr Positioning             : Centered
Exposure Time                   : 1/320
F Number                        : 10.0
ISO                             : 200
Exif Version                    : 0221
Date/Time Original              : 2006:02:23 16:01:56
Create Date                     : 2006:02:23 16:01:56
Components Configuration        : YCbCr
Compressed Bits Per Pixel       : 3
Shutter Speed Value             : 1/320
Aperture Value                  : 10.0
Max Aperture Value              : 3.5
Flash                           : No Flash
Focal Length                    : 18.0mm
Macro Mode                      : Unknown (0)
Self-timer                      : 0
Quality                         : Fine
Canon Flash Mode                : Off
Continuous Drive                : Single
Focus Mode                      : AI Focus AF
Canon Image Size                : Large
Easy Mode                       : Manual
Digital Zoom                    : Unknown (-1)
Contrast                        : +1
Saturation                      : +1
Sharpness                       : +1
Camera ISO                      : n/a
Metering Mode                   : Evaluative
Focus Range                     : Not Known
AF Point                        : Manual AF point selection
Canon Exposure Mode             : Program AE
Lens Type                       : Unknown (-1)
Long Focal                      : 55
Short Focal                     : 18
Focal Units                     : 1
Max Aperture                    : 3.6
Min Aperture                    : 22
Flash Activity                  : 0
Flash Bits                      : (none)
Zoom Source Width               : 3072
Zoom Target Width               : 3072
Color Tone                      : Normal
Focal Plane X Size              : 23.22mm
Focal Plane Y Size              : 15.49mm
Auto ISO                        : 100
Base ISO                        : 200
Measured EV                     : 9.00
Target Aperture                 : 10
Target Exposure Time            : 1/318
Exposure Compensation           : 0
White Balance                   : Auto
Slow Shutter                    : None
Shot Number In Continuous Burst : 0
Flash Guide Number              : 0
Flash Exposure Compensation     : 0
Auto Exposure Bracketing        : Off
AEB Bracket Value               : 0
Focus Distance Upper            : -0.01
Focus Distance Lower            : 5.46
Bulb Duration                   : 0
Camera Type                     : EOS Mid-range
Auto Rotate                     : None
ND Filter                       : Unknown (-1)
Self-timer 2                    : 0
Bracket Mode                    : Off
Bracket Value                   : 0
Bracket Shot Number             : 0
Canon Image Type                : IMG:EOS 300D DIGITAL JPEG
Canon Firmware Version          : Firmware Version 1.1.1
Camera Body No.                 : 0930402471
Serial Number Format            : Format 1
File Number                     : 159-5966
Owner's Name                    :
Canon Model ID                  : EOS Digital Rebel / 300D / Kiss Digital
Canon File Length               : 2387078
WB RGGB Levels Auto             : 1726 832 831 948
WB RGGB Levels Daylight         : 0 0 0 0
WB RGGB Levels Shade            : 0 0 0 0
WB RGGB Levels Cloudy           : 0 0 0 0
WB RGGB Levels Tungsten         : 0 0 0 0
WB RGGB Levels Fluorescent      : 0 0 0 0
WB RGGB Levels Flash            : 0 0 0 0
WB RGGB Levels Custom           : 0 0 0 0
WB RGGB Levels Kelvin           : 0 0 0 0
Color Temperature               : 5200
Num AF Points                   : 7
Canon Image Width               : 3072
Canon Image Height              : 2048
Canon Image Width As Shot       : 3072
Canon Image Height As Shot      : 2048
AF Points Used                  : Mid-left
Preview Quality                 : Normal
Preview Image Length            : 278318
Preview Image Width             : 1536
Preview Image Height            : 1024
Preview Image Start             : 2108760
Preview Focal Plane X Resolution: 3443.9
Preview Focal Plane Y Resolution: 3442.0
User Comment                    :
Flashpix Version                : 0100
Color Space                     : sRGB
Exif Image Width                : 3072
Exif Image Length               : 2048
Interoperability Index          : R98 - DCF basic file (sRGB)
Interoperability Version        : 0100
Related Image Width             : 3072
Related Image Length            : 2048
Focal Plane X Resolution        : 3443.946
Focal Plane Y Resolution        : 3442.017
Focal Plane Resolution Unit     : inches
Sensing Method                  : One-chip color area
File Source                     : Digital Camera
Custom Rendered                 : Normal
Exposure Mode                   : Auto
Scene Capture Type              : Standard
Compression                     : JPEG (old-style)
Thumbnail Offset                : 2560
Thumbnail Length                : 7680
Image Width                     : 3072
Image Height                    : 2048
Aperture                        : 10.0
Drive Mode                      : Single-frame shooting
Flash                           : Off
Image Size                      : 3072x2048
Lens                            : 18.0 - 55.0mm
Preview Image                   : (Binary data 278318 bytes, use -b option to extract)
Preview Image Size              : 1536x1024
Scale Factor To 35mm Equivalent : 1.6
Shooting Mode                   : Program AE
Shutter Speed                   : 1/320
Thumbnail Image                 : (Binary data 7680 bytes, use -b option to extract)
WB RGGB Levels                  : 1726 832 831 948
Blue Balance                    : 1.140108
Circle Of Confusion             : 0.019 mm
Focal Length                    : 18.0mm (35mm equivalent: 27.9mm)
Hyperfocal Distance             : 1.67 m
LV                              : 14.0
Lens                            : 18.0 - 55.0mm (35mm equivalent: 27.9 - 85.3mm)
Red Balance                     : 2.075767

The desire for Performance SQL Tips

November 19, 2006 by ronald

It seems, people are clammering for a more consolidated help guide for SQL Performance tips.

Jay Pipes at the MySQL Camp ran a session Interactive Top 10 SQL performance Tips. There was plenty of input and discussion, and at the time Sheeri simply typed them into a wiki page for later work.

Well it seems even that rough list is popular at Del.icio.us ranking near the top of the Hot List on the front page. I saw it earlier and it was second or third, but didn’t think of taking a screen shot until now, but it’s still high.

I’d say that we could easily get the Top 10 for up to 10 different categories rather easily. Good luck Jay.

MySQL Replibeertion

November 13, 2006 by ronald

MySQL Replibeertion was the last scheduled session on Day 2, but not withstanding there was free beer (a lot of), there was a serious side with a Replication Discussion.

One of the first questions by Jeremy was “Are the any big replication users?” to which Sheeri quickly replied “Are you calling me fat again”.

This was a highly interactive session, here are some of the points from the audience.

Some Uses of Replication

Backup
Hot standby
Scaling
Data Warehousing
- Slaves are larger then your database
For no special reason
Consolidation of multiple sources
Support for multiple indexes

Issues

Can break
replication lag
bi-directional replication is not supported
hard to setup replication/initialization upto point to run one command
hard to know when the slave is out of sync (working but broken) diagnose
does not manage binary logs for you (max-bin-logs )
no row level replication (5.1 row based replication, change based replication, good and bad )
Serialized execution on the slave
Master does not keep track of the slaves (to the master, the slave is just another connection) Jeremy comment “it really really sucks in production systems.”
No multi-master replication. A slave can not have more then one master.
ring replication. No idea when something breaks what is right.
no ability for delayed duplication.
no way to get binlogs back. (manually twink the info file)
Master doesn’t care what data is on the slave.
A replication backup is really only good for the restoration of that machine
Default reconnection timeout is way, way to high (default of 1 minute). It should be at most 1 second with exponential fallback. (master-connect-retry), no fallback, no max number of retries, logged in error log every time.

Feature Results (Things replication needs, what you want to see)

Delayed Replication
Registered Salves in the Master
Import Binlog
Checksum Table Events (Need ability for table checksum to be added to binlog periodically so it can be checked by the slave.)
Global Sequence Number
Connect Retry Exponential Backoff
Hetrogous Replication (Oracle to MySQL). Golden Gate Software has a commercial offering
Command Exclusion List (sql_log_bin=0)
Replication filters by data on the slave
Show upcoming queries, skip query
Multi-Master to one slave
piping mysqlbin log commands into the mysql client fails for some character sets.
Binlog index capability
Checksum of Binary Events to determine a command is valid
command line interface in mysqllog so you could go backwards and forwards, then execute commands.

Check out more at Google Code Blog.

MySQL Camp – Introductions & Comments

November 11, 2006 by ronald

The great thing about this unconference, is the lack of total formal structure. For now , our first session we are having an open introduction of people, there are at good 60+ people here already, and people rolling in, and it’s great to hear people’s background, and also to bag Jeremy Cole at every opportunity. We have a variety of people from various backgrounds, companies and experience levels.

We are in the Kiev room, with power build into the desks, lots of desk space and full 360% swivel chairs. This is just another example of the company’s clear thinking about it’s requirements.

There have already been some very funny stories, I should have made more earlier notes. Here are some.

Adam Ritter (Proven Scaling ride winner) was the first to bag Oracle, really bold move with Ken Jacobs from Oracle directly behind him, and he had already made his introduction.

Paul Tuckfield of You Tube guys said to Jeremy Cole re his replication talk “I want to make it, but we have already met before.” There have been about 10 bags of Jeremy already, he is giving as good as he is getting. Proven Scaling are sponsoring Beer session tomorrow night. Great stuff Jeremy. He did also ask how many people were planning on coming, given the number of people at the MySQL Camp has tripled in the past few days.

Breaking news. Mark Callaghan from Google, “Is there anybody from You Tube here”, to which the Paul Tuckfield of You Tube identified himself. After a few quick words the Google comment was “Deals Off” which made everybody laugh. That’s been level of good interaction with people. here

Flickr DB dude (his words) Dathan Vance Pattishall said to Ken Jacobs “You work for InnoDB, right”. Ken Jacobs response was “InnoDB works for me.”. Again a lot of laughs.

MySQL Quotes

November 9, 2006 by ronald

Frank was on a role with MySQL quotes (it’s 1am here in New York – All that Red Bull & Vodka). Here are some of them:

Let me scale you!

Wanna scale.

Scale me Baby!

Backup Now!

MySQL – DBA Friendly.

MySQL – Use the Attitude.

MySQL. Be Bold!

MySQL. Look Again

MySQL – Coming to a website near you.

One small step for Data, one giant leap for DBA.

Data, we are serious about it.

My Job, My Passion. MySQL.

MySQL. Never Doubt.

MySQL. Scaling made Easy.

MySQL. Scaling all you want.

Got MySQL!

Do it with MySQL.

Scale Yourself.

MySQL or die.

I’ve also done some of my own shirts designs (see small images below), a number I already have on shirts (you can check them out on me at MySQL Camp).

Some other references include MySQL forge Merchandise and Arjen’s suggestions.

The impact of Mathematics in our RDBMS world

November 7, 2006 by ronald

We sometimes forget the impact of early mathematicians. Take for example John Venn who introduced the Venn diagram in 1881.

This is fundamental for Set Theory, and also fundamental for SQL, the basis of retrieving data in a RDBMS.

Log Buffer #13: a Carnival of the Vanities for DBAs

October 7, 2006 by ronald

Unlike fellow author Giuseppe of last week’s Log Buffer #12 I volunteered for the job of this week’s Log Buffer. Lots to say, so little time, so lets get started with Log Buffer #13.

Tom Kyte has been at the DBForum 2006 in Denmark. Apart from the contents of the Forum, his picture and comment “I spied some artifacts from Mogens Oracle Museum, a copy of the Version 3 and Version 4 Oracle” in Dbforum 2006, in the past… was an impressive look back in time. Manuals, what are they? So how old is this? Wikipedia History places Oracle Version 4 at 1984, some 22 years ago. One of the comments to Tom’s entry takes us to Back to the future (Oracle 4.1 VM appliance). The title gives the article’s content away, but worth a view of Oracle history. Good to also see Tom won the 42 Question Quiz on day 1, but what was the question he got wrong?

Ric Smith gives us a window of this month’s upcoming Oracle Open World with some details of Oracle Open World 2006 – Oracle Develop, “a new event tailored for the “geek” in us all. The format will make for a more developer-oriented conference”. Craig Mullins is at the European International DB2 User Group conference being held in Vienna this week (must be the month for RDBMS conferences). Details of his presentation “Change Control for DB2 Access Paths” are at IDUG in Vienna.

Build Your Own Oracle RAC Cluster on Linux – Again references a very detailed article and explanation by Jeffrey Hunter on RAC and shows the benefit by contributions to the OTN Oracle Community. If you’re heading to Oracle Open World this month, Justin Kestelyn mentions details of a similar presentation being held during the “Oracle on Linux Experience” portion of OTN Night on Monday Oct 23.

David Aldridge in his article Linux 2.6 Kernel I/O Schedulers for Oracle Data Warehousing: Part II has received some good responses in his concise and simple Benchmark. I always strive for simplicity in solving problems and this looks like a good simple approach to graphing I/O.

A little off the beaten track is Applying Web 2.0 to the Enterprise by Jonathan Bruce. The reason why I mention this is two fold. Firstly, decisions made by Project Management can have a big effect on the software development process, and this can have a significant effect on the DBAs and System Administrators that support systems. The article also mentions Agile Software Development of which I am a strong proponent. As I have a very detailed database background I’m also wary of some of the “strenghts” mentioned generally with Agile. A topic I’m happy to discuss more at some time.

E A D G B E. I have no idea what that means, you will need to read Ian Thain’s article regarding the Sybase WorkSpace to find out. He publishes some healthy performance improvement throughput figures with his 3 tuning guidelines.

Firebird 2.00 Release Candidate 5 has also been released this week. The news article indicates that this will probably become the release version.

Peter Scott reminds us that with all the technology advances and an existing 8 year old system which includes documentation, things still happen in I hate on call.

Greg Sabino Mullane over at Planet PostgreSQL is keeping abreast of the various open source offerings with his report on Berkeley DB now does MVCC. His comment “Looks like Oracle is actually doing something with their purchase …. Curiously, this comes right at the point when MySQL is dropping the BDB engine from their product.”. Hmmm, interesting observation, however it wasn’t the actual reason why BDB was dropped from MySQL. The actual reason mentioned some time ago can be found at BDB Engine removal.

Are we working in a booming industry? “Overall, Gartner is predicting that the worldwide DBMS market is around $14 billion and will continue to grow by nearly 7% per year”. This comment by Zack Urlocker is from his attendance of the “Gartner Open Source Summit - a very thorough analysis of the impact of open source technology in the database market.” You can read all his comments of the summit in Gartner Mastermind panel and Gartner on Open Source Databases.

Exploring the secrets of intermediate materialization by Adam Machanic revives a trick he had in SQL Server 2000 in improving logical reads when query tuning. This example shows it’s operation in SQL Server 2005.

Peter Zaitsev gives us a quick refresher on his MySQL Performance Blog with What to tune in MySQL Server after installation. A good introduction reference of configurable system variables, particularly for those non-MySQL DBA’s that need to also support a MySQL installation. Mike Kruckenberg also gives us a valuable consolidated reference in his twin articles, Guide to Incompatibilities when Upgrading MySQL to Version 4.1 and Guide to Incompatibilities when Upgrading MySQL to Version 5.0. Essential reading for clearly understanding MySQL database upgrades and possible traps.

MySQL Tools for Microsoft Visual Studio 1.0.1 beta has been released. Enough said. Ok, well for those that want some more detail, I quote from Reggie Burnett – a downloadable plug-in for Visual Studio 2005 that allows Windows developers to quickly build MySQL data-driven applications with Visual Studio. With this plug-in, developers will be able to create, modify and manage MySQL database objects with an easy-to-use interface from within the Visual Studio IDE.. If only I used Microsoft I could check it out!

Daniel Schneller highlights one of the problems in a large scale out MySQL implementation in his article MySQL replication timeout trap. Valuable information in a network infrastructure to ensure your slaves are performing optimally.

Normally I’d summarise a worthy article for review, this time I’ve reproduced the concise summary by Jason Gaylord in Preventing SQL Injection Attacks which explains his content. Scott Guthrie just posted some really good stuff about preventing SQL injection attacks. In his blog post he talks about an application that Michael Sutton created to check SQL injection attacks by screening Google search and looking for sites with QueryString, etc. Check out his post for more details: http://weblogs.asp.net/scottgu/archive/2006/09/30/tip_2f00_trick_3a00_-guard-against-sql-injection-attacks.aspx

Of the big 5 or 6 RDBMS products of the past 2 decades, DB2 is the only one that hasn’t crossed my path in some way. Willie Favero writes What’s in a name – The saga continues…, sharing his views on the official name of DB2® Version 9 for z/OS.

Jeremy Cole has been busy in recent months with his new found freedoms in his new venture Proven Scaling. He has released another MySQL Source Patch with On Triggers, Stored Procedures, and Call Stacks. Keep em comin’ Jeremy. And just as I complete this weeks Log Buffer, good mate Jay (the plumber) Pipes has published HOWTO: Making a Corresponding Test Case for your Patch. Very worthy information for all those past, present and future patch writers.

The Data Charmer Giuseppe Maxia gives us the inside goss on his recent vacation interests in Take the MySQL Certification in five steps. Good Advice, I liked Point 5, and the unofficial Point 6. Marcus Popp also points us to New Lists of Certified Candidates online so you can see your name in lights. Reminds me to stop procrastinating and to take the MySQL 5 exams myself. It’s been on the cards for a few months now.

We end this week with one of those feel good stories of something that inspires me. Paul McCullagh has written his own MySQL transactional storage engine. PBXT beta 0.97 has just been released as a Pluggable storage engine for MySQL 5.1. Quoting Paul “PBXT is the first full featured engine to be released in this form.”. This leverages a new feature in the upcoming MySQL 5.1 GA release where developers can use MySQL’s extensible Storage Engine Architecture as a plugin without the need for recompiling with MySQL source. Look out for a lot more opportunities in storing and access different types of data in the future with this feature. [Author Side Note: Compiling MySQL from the latest BK tree may contain code features that are not fully tested (e.g. 2 Oct 2006). It’s best when integrating other patches or plugins to use a known MySQL Source Snapshots, otherwise things may break!]

And with a certain amount of deja vu from last week’s closing Log Buffer #12 comment by Giuseppe, your’s truly will also be joining MySQL. Checkout my If you can’t beat them, join them.

That’s all for lucky #13. Thanks for the opportunity Dave.

If you can't beat them, join them!

October 7, 2006 by ronald

Like fellow friends and MySQL’ers before me Morgan, Roland, Giuseppe, Markus and Sean, I’ve joined the MySQL juggernaut on the ride of my life, achieving two of my short/medium term professional goals in one step. Woot!

It says something to me about the company I’m very excited to work for when I knew of all these people before they joined MySQL this year (2006). I’ll also be joining other friends and MySQL people Arjen, Jon, Jay, Colin, Michael Z and I still have a list of friends that I’ve met while being part of the MySQL community.

And as Giuseppe said I’ll be working in a virtual company. Another article I like to tell others about MySQL is MySQL: Workers in 25 countries with no HQ.

I’ll leave you with the MySQL Values from the Company About MySQL AB page.

We want the MySQL server to be:

The best and the most used database in the world
Available and affordable for all
Easy to use
Continuously improved while remaining fast and safe
Fun to use and improve
Free from bugs

MySQL AB and the people of MySQL AB:

Subscribe to the Open Source philosophy
Aim to be good citizens
Prefer partners that share our values and mindset
Answer email and give support
Are a virtual company, networking with others

Logical Data Modelling (LDM)

September 17, 2006 by ronald

Following my User Group Presentation I was asked by fellow MySQLer Kim about Logical Data Modelling (LDM), in relation to Physical Data Modelling.

Well, first the brain had to work overtime to remember when was the last time I worked on a Logical Data Model. The answer to that is 1996 doing R&D work for Oracle Corporation with their CASE repository tool, Oracle Designer, about version 1.3/1.3.2. I’ve learnt in the past 10 years to purge technical stuff from my brain, leading from the capacity in be able to remember in detail data models, data migration and data cleansing issues of projects even after leaving them 3 years eariler.

As Kim pointed out, he thinks physically, actually directly at the SQL level, then working backwards to produce an appropiate physical model. To think logically is to consider the entities and attributes and relations before considering the physical tables, columns and relationships. So how do you program somebody to think logically? In the case with Kim, he is undertaking formal studies after already grasping the concepts of software development. Generally today more people don’t undertake the formal education and we end up with The Hobbist and the Professional Syndrome.

I guess in summary I’d argue why bother. Does anybody still do traditional logical models? Feedback/comments are welcome. (Professionally I believe there is a place for LDM).
However for the purpose of the exercise lets start with a Physical Representation and present a Logical Representation of that to see the differences.

From this you can see a classic Library example, showing a table of Books and a Table of Authors, and an intersection table (3NF) to indicate that a Book can have one to many Authors, and likewise an Author can have one to many Books.

So how would this have been represented in a Logical Model.

Some things to consider in Logical Modelling.

Attributes really only require a name, and perhaps just a datatype using number, string, date (Jim Starkey would be happy), but nothing specific like SMALLINT, or VARCHAR(50).
An Attributes Mandatory/Optional state may or may not be known
Information is not in 3NF. i.e. relationships can be many to many (in our example, there is no concept of the intersection table

I’m going from memory here, so there is probably more points to consider.

So how do you teach this when you are trying to work backwards, when I learnt this back in 1988 (some 18 years ago), I’d never created a database or used SQL so I didn’t have the history. Arjen had a brilliant idea to consider Logical Modelling after the fact. Views. Consider an end user requirement for Reporting, and how you would represent your model better to an end user (effectively de-normalising these views so users don’t have to know about joins).

In this case you would create a view of Books and a View of Authors. Details such as mandatory/optional isn’t important to the end user report (ie, it’s not like it’s needed to be enforced), and specific datatype details again are not that important. The basics to know how to format a number or a date works.

An interesting approach that worked well in our explaining.

The Text Book

So it was interesting to go back to the text book using C.J. Date’s An Introduction to Database Systems to review definitions.

Logically isn’t really referenced, the term used is conceptual. A conceptual definition is “The right way to do database design is to get the logical design right first, without paying any attention whatsoever at that stage to physical – that is, performance – considerations”

Also for example Third Normal Form we get from Section 12.3 First, Second and Third Normal Forms.

Third normal form (very informal definition): A relvar is in 3NF if and only if the monkey attributes (if any) are both:

a. Mutually independent
b. Irreducibly dependent on the primary key

We explain the terms nonkey attribute and mutually independend (loosely) as follows:

A nonkey attribute is any attribute that does not participate in the primary key of the relvar concerned.
Two or more attributes are mutually independent if none of them is functionally dependent on any combination of the others. Such independence implies that each attributes can be updated independently of the rest.

Man, no wonder many years of experience and having generally seen most cases, enables me to forget this and not feel like I’ve fogotten something.

The Hobbyist and the Professional

September 17, 2006 by ronald

I first coined this term in February 2006 in a paper titled “Overcoming the Challenges of Establishing Service and Support Channels” for the conference “Implementing Open Source for Optimal Business Performance” View Paper.

I must have referenced this several times, time to give this topic it’s due notice. In summary it targets three areas:

the lack of appropriate database design in open source projects
the lack of coding standards
the lack of sound programming principles

Here are the bullet points from the slide in a presentation.

Hobbyist

Download-able software and examples
Online tutorials
Books like Learn in 24 hours/For Dummies

Professional

Formal Qualifications
Grounding in sound programming practices
Understanding of SDLC principles
Worked in team environment

Middle Ground Developer

Time to skill verses output productivity
Depends on environment and requirements

Slow Queries aren't always that bad!

September 16, 2006 by ronald

Well, now I have your attention, Slow Queries are bad (unless it’s a single user system and you don’t care). However there are worse things then slow queries in a large enterprise system.

I’ve been asked in recent weeks a number of questions which has brought this topic to discussion, as well as a current implementation I’m undertaking for a client of a purchased product.

High volume repetitive queries can have a worse effect on your system’s performance. Combined with slow queries that take locks, these queries can have an extreme effect on performance and if you don’t know your application, or have the right tools, it can be initially hard to diagonise easily.

This problem is the classic Wasting CPU cycles problem, seen it before, will see it again.

Here’s a classic example for reference. Using the current product that I’m customising and installing (I’ve not be involved in any development), a typical Customer Information System (CIS).

The system when deployed will have 1.1 Million Customers. Each Customer has a Balance. Now the Customer Balance is not recorded against the customer record, it’s calculated every time it’s required. So, no big deal, well yes. The stored procedure to calculate the balance hit’s 4 different tables, and one of these tables is one of the largest in the system (recording all detailed financial transactions). Multiple batch reports alone that work with large sets of customers all require or use the current balance in some means.

There are only a small set of transactions that affect the balance including Invoice Statements, Payments, Adjustments, Credit Control, Write-Offs . These processes by there own nature are either batch or small online transactions. The calculation of the balance can be applied against the Chunk of customers in a batch, or individual Row for online transactions (See The RAT and the CAT).

Even for the paranoid, the re-calculation of the customer balance for all customers in batch is more efficient after hours, yet only customers where some transaction that has been applied since the last time the re-calculation ran is necessary.

MySQL doesn’t be default have any analysis tools to identify and manage these types of queries. Peter Zaitsev in his article Slow Query Log analyzes tools highlights the issue with a good approach by a source code change to long_query_time as well as some additional scripts.

Hopefully MySQL will consider a more improved approach in future releases that doesn’t require patching the source code. Now it’s unlikely that MySQL will do another SHOW STATUS Gotcha and for example change the unit of measure from seconds to milliseconds, but they could make it a float, so you could specify 0.1 for example.

Stories that impress and motivate you

September 16, 2006 by ronald

I’ve worked for two Internet startup companies, both around 2 years each, both now long dead. The first was due to eventual lack of new VC funds, the second gross financial managment in the second year (apparently, when we were told there was no money December one year to pay us, the company that made large profits every month for over the first year, then had made losses every month for the past 12 months, but nobody knew about it. There were 5 Directors from 3 countries and nobody knew. Yeah Right!)

I’ve learnt a lot of non IT street smarts in this time. The first startup took the VC route, and after 3 rounds while I wasn’t involved in the process you pick up things. The single biggest tip here is the Bell-Mason Diagnostic. Here a few introduction references worthy of a quick review (One, Two).

When you take any great idea, and then consider the 4 quadrants and 12 axis you realise you really need to make larger circles of professional contacts.

Zac’s article Valley Boys Run MySQL talks about the new breed of Web entrepreneurs and Web 2.0. In particular check out
Valley Boys Digg.com’s Kevin Rose leads a new brat pack of young entrepreneurs. This is the new wave of success that works without VC.

For me this recent post on the Meebo Blog really impressed me.

365 days ago

Nineteen releases, eleven (fantastic) team members, and 295,321 gummy bears later we see over a million users log into meebo each day. As a coder, all you hope is that your service will be able help people go about their day-to-day lives. Stability, bugs, and good usability are always top of mind.

1,000,000 users of your Web 2.0 application. This impresses and motivates me. What makes it possible to anybody, is you can get a LAMP stack, a live-cd, cheap hosting and you can turn your idea into something real for next to no cost. Of course, I won’t start on the nightmares out there of great ideas that are very poorly designed. At least the underlying stack can support anything you want to achieve, and MySQL is behind these success stories.

One more thing on Meebo, Check out the meebo map!. I’ve been told that Google has something of a similar nature at the Googleplex. Well I can say I’m very keen to see this, and will be 8 weeks time when the First MySQL Camp is held.

MySQL Trigger Features

September 14, 2006 by ronald

Sheeri talked a little about MySQL Triggers in One TRIGGER Fact, Optimizer Rewriting Stuff. While it’s great that MySQL 5.0 has Triggers, there are some things that I feel could become features in future releases.

IF EXISTS

One of the beautiful features that MySQL has is IF EXISTS. This ternary operation that if the object exists performs the operation, of not it does nothing works wonders in reviewing logs for errors. One of the problems with Oracle for example, is the requirement to ignore the ORA errors for non-existent objects.

But this functionality doesn’t exist for Triggers? One must wonder why. I’d like to see this.

mysql> DROP TRIGGER IF EXISTS trigger_name;

MySQL Manual DROP TRIGGER

REPLACE

On feature that simplifies the lack of IF EXISTS functionality using Oracle is REPLACE. The syntax is:

oracle> CREATE OR REPLACE TRIGGER trigger_name...

In this case, this functionality effectively eliminates the need for a DROP IF EXISTS, however I’d not like to see this introduced into MySQL especially it’s an Oracle specific syntax and not ANSI SQL.

Multiple Triggers of same type per table

MySQL only allows one Trigger per type per table. That is, there can only be one BEFORE UPDATE for example. While you may ask the question why you would need this functionality. Here is a typical situation.

You use Triggers to perform some level of business functionality, determining values for optimised (denormalised) columns is a good example. So you need to write an appropiate trigger for that piece of functionality.
You also use Triggers to perform database auditing, that is for every insert/update/delete of data, you record a full copy of the change in an audit database. One way to ensure this is consistent across the entire database is to implement via triggers. So you leverage programming functionality to pre-create triggers for all tables to manage this.

The problem with MySQL occurs in that you have to now merge these triggers for tables that require both. If you want to deploy your application into a test environment, you may wish to not deploy your auditing triggers, but now you have this functionality mixed in with business logic.

Multi Type Triggers

Another cool Oracle feature is the capacity to define a trigger for multiple types. For example:

oracle> CREATE OR REPLACE TRIGGER trigger_name
oracle> BEFORE INSERT OR DELETE OR UPDATE ON table_name ...

In this example, with MySQL you would need to create three seperate triggers.

Other features

While not as important and one would need to consider if necessary in MySQL are some other Oracle provided trigger functionality. This includes:

WHEN CLAUSE trigger restriction
Triggers on DDL Statements such as CREATE, DROP, ALTER
Triggers on Database events such as LOGON, LOGOFF, STARTUP, and SHUTDOWN
INSTEAD OF Triggers (for Views)
STATEMENT based trigger

For more information you can check out the Oracle Documentation on Introduction to Triggers, CREATE TRIGGER, Documentation Search on Triggers

Brisbane Users Group – MySQL Hackfest

September 6, 2006 by ronald

Last night we had a number of keen souls at the Brisbane MySQL User Group. I was very impressed to see the majority of people with laptops at hand.

You can download my slides and code examples at my Articles page.

In our hands on Hackfest tutorial we created the new command SHOW USERGROUPS. Before anybody makes a comment, it was stated in the presentation that this command was made a dummy one, and is a poor candidate for two reasons.

The results should be more dynamic, rather then hardcoded into the source tree
USERGROUPS is not an ideal name due to comparisions to Users, Groups, Roles etc

Still it was productive, here was the outcome of our work for the evening.

mysql> SHOW USERGROUPS;
+----------------------+-------------------------+-----------------------------+
| Name                 | Location                | Website                     |
+----------------------+-------------------------+-----------------------------+
| Brisbane Users Group | Brisbane, QLD Australia | http://mysql.meetup.com/84/ |
| Boston Users Group   | Boston, USA             |                             |
+----------------------+-------------------------+-----------------------------+
2 rows in set (0.00 sec)

The trailing slash ‘/’ Hitcho’s contribution

mysql> SHOW AUTHORS;
+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
| Name                           | Location                              | Comment                             |
+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
| Brian (Krow) Aker              | Seattle, WA, USA                      | Architecture, archive, federated, bunch of little stuff :)           |
...
 Brisbane Users Group           | Brisbane, QLD Australia               | New Command SHOW USERGROUPS                             |
+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
75 rows in set (0.05 sec)

I’ll be publishing more beginner tutorials in my MySQL Compiling Category over time.

Update Check out http://www.arabx.com.au/hackfest.diff.htm for a diff of files to produce this output.

Compiling MySQL Tutorial 3 – Debugging Output

September 5, 2006 by ronald

Continuing on from Tutorial 2.

When reviewing the 2.1. C/C++ Coding Guidelines for MySQL, you will see that the MySQL Source uses within the C/C++ code DBUG (Fred Fish’s debug library). This is one of the 3rd party open source products that are documented in the Internals 1.4. The Open-source Directories.

You will also find some usage in the MySQL Manual E.3 The DBUG Package. So enough talk, how do you use this.

DBUG

You get the DBUG output by running a mysqld debug version with the argument –debug. The output for the SHOW AUTHORS commands is:

T@9190304: >dispatch_command
T@9190304: | >alloc_root
..
T@9190304: | query: SHOW AUTHORS
T@9190304: | >mysql_parse
T@9190304: | | >mysql_init_query
T@9190304: | | | >lex_start
T@9190304: | | | | >alloc_root
T@9190304: | | | | | enter: root: 0x9590b38
T@9190304: | | | | | exit: ptr: 0x95c1c70
T@9190304: | | | | <alloc_root
T@9190304: | | | <lex_start
T@9190304: | | | >mysql_reset_thd_for_next_command
T@9190304: | | | <mysql_reset_thd_for_next_command
T@9190304: | | <mysql_init_query
T@9190304: | | >Query_cache::send_result_to_client
T@9190304: | | <query_cache ::send_result_to_client
T@9190304: | | >mysql_execute_command
T@9190304: | | | >mysqld_show_authors
T@9190304: | | | | >alloc_root
T@9190304: | | | | | enter: root: 0x9590b38
T@9190304: | | | | | exit: ptr: 0x95c1c78
T@9190304: | | | | <alloc_root
...
T@9190304: | | | | >send_fields
T@9190304: | | | | | packet_header: Memory: 0x8c2480  Bytes: (4) 01 00 00 01
T@9190304: | | | | | >alloc_root
T@9190304: | | | | >Protocol::write
T@9190304: | | | | <protocol ::write
...
T@9190304: | | | | >send_eof
T@9190304: | | | | | packet_header: Memory: 0x8c2550  Bytes: (4) 05 00 00 51
T@9190304: | | | | | >net_flush
T@9190304: | | | | | | >vio_is_blocking
T@9190304: | | | | | | | exit: 0
T@9190304: | | | | | | <vio_is_blocking
T@9190304: | | | | | | >net_real_write
T@9190304: | | | | | | | >vio_write
T@9190304: | | | | | | | | enter: sd: 23, buf: 0x0x95b9bd0, size: 5107
T@9190304: | | | | | | | | exit: 5107
T@9190304: | | | | | | | <vio_write
T@9190304: | | | | | | <net_real_write
T@9190304: | | | | | <net_flush
T@9190304: | | | | | info: EOF sent, so no more error sending allowed
T@9190304: | | | | <send_eof
T@9190304: | | | <mysqld_show_authors
T@9190304: | | <mysql_execute_command
T@9190304: | | >query_cache_end_of_result
T@9190304: | | <query_cache_end_of_result
...
T@9190304: | <mysql_parse
T@9190304: | info: query ready
T@9190304: | >log_slow_statement
T@9190304: | <log_slow_statement
T@9190304: >dispatch_command

Using some of the debug options you get the following output with –debug=d,info,error,query,general,where:O,/tmp/mysqld.trace

create_new_thread: info: creating thread 3
create_new_thread: info: Thread created
?func: info: handle_one_connection called by thread 3
?func: info: New connection received on socket (23)
?func: info: Host: localhost
?func: info: vio_read returned -1,  errno: 11
thr_alarm: info: reschedule
process_alarm: info: sig: 14  active alarms: 1
?func: info: client_character_set: 8
check_user: info: Capabilities: 238213  packet_length: 16777216  Host: 'localhost'  Login user: 'rbradfor' Priv_user: ''  Using password: no Access: 0  db: '*none*'
send_ok: info: affected_rows: 0  id: 0  status: 2  warning_count: 0
send_ok: info: OK sent, so no more error sending allowed
do_command: info: vio_read returned -1,  errno: 11
thr_alarm: info: reschedule
process_alarm: info: sig: 14  active alarms: 1
do_command: info: Command on socket (23) = 3 (Query)
dispatch_command: query: SHOW AUTHORS
send_eof: info: EOF sent, so no more error sending allowed
dispatch_command: info: query ready
do_command: info: vio_read returned -1,  errno: 11
thr_alarm: info: reschedule
process_alarm: info: sig: 14  active alarms: 1
do_command: info: Command on socket (23) = 1 (Quit)
~THD(): info: freeing security context
end_thread: info: sending a broadcast
end_thread: info: unlocked thread_count mutex

I have found the following format good for comparing with the MySQL Source Code. —debug=d:N:F:L:O,/tmp/mysqld.trace

   49:   sql_parse.cc:  1536: do_command: info: Command on socket (23) = 3 (Query)
   50:     my_alloc.c:   172: alloc_root: enter: root: 0x9068b38
   51:     my_alloc.c:   219: alloc_root: exit: ptr: 0x9099c30
   52:   safemalloc.c:   128: _mymalloc: enter: Size: 16384
   53:   safemalloc.c:   197: _mymalloc: exit: ptr: 0x909cc80
   54:   safemalloc.c:   262: _myfree: enter: ptr: 0x9095bf8
   55:   sql_parse.cc:  1757: dispatch_command: query: SHOW AUTHORS
..
   88:    sql_show.cc:   385: mysqld_show_authors: packet_header: Memory: 0x940590  Bytes: (4)
..
  166:    protocol.cc:   385: send_eof: packet_header: Memory: 0x940550  Bytes: (4)
05 00 00 51
  167:    viosocket.c:   184: vio_is_blocking: exit: 0
  168:    viosocket.c:   105: vio_write: enter: sd: 23, buf: 0x0x9091bd0, size: 5107
  169:    viosocket.c:   117: vio_write: exit: 5107
  170:    protocol.cc:   342: send_eof: info: EOF sent, so no more error sending allowed
  171:     sql_lex.cc:   199: lex_end: enter: lex: 0x9068b58
  172:   sql_parse.cc:  1796: dispatch_command: info: query ready
  173:     my_alloc.c:   331: free_root: enter: root: 0x9068b38  flags: 1
  174:    viosocket.c:   184: vio_is_blocking: exit: 0
  175:    viosocket.c:    36: vio_read: enter: sd: 23, buf: 0x0x9091bd0, size: 4
  176:    viosocket.c:    49: vio_read: vio_error: Got error 11 during read
  177:    viosocket.c:    52: vio_read: exit: -1
  178:   sql_parse.cc:   809: do_command: info: vio_read returned -1,  errno: 11

  190:   sql_parse.cc:  1536: do_command: info: Command on socket (23) = 1 (Quit)
  191:     my_alloc.c:   331: free_root: enter: root: 0x9068b38  flags: 1
  192:      mysqld.cc:  1681: close_connection: enter: fd: socket (23)  error: ''
  193: sql_handler.cc:   628: mysql_ha_flush: enter: tables: (nil)  mode_flags: 0x02

This will make a lot more sense with my upcoming presentation where you will see clear use and interaction between source files such as sql_parse.cc & sql_show.cc.

For more information check out Making Trace Files

It's makes me cry

August 31, 2006 by ronald

I got home today and sat down to read my home email list. Nothing new. But on a MySQL mailing list, there was an enquiry why performance was slowing in a given application. I didn’t even have to read the situation, nor the problem, it took less then the 200ms mentioned to identify the problem looking at the supplied schema.

In summary, the first table in the schema had a primary key of VARBINARY(255) and a engine type of Innodb. Hold on, wait, it’s a concatenated key of two VARBINARY(255) columns. And I should mention, that primary key was a foreign key in the next table. If this was a home website app with one user, ok well it’s still bad, but this application was having performance problems with reasonable volumes of transactions, it’s not a beginner application. (A recent reference If you don’t know your data, you don’t know your application). Where do people learn this from!

Now I want to work on a large scale out MySQL environment, an organisation or site that’s going places, but the world of even normal every day applications is littered with these basic fundamental design errors. It’s make me cry.

My last contract had many flaws across the business, and the Java code for production applications was just as bad. Check out things like What constitutes a good error message to the user?, Why IT professionals get a bad name (I could easily write one of these a day for a long time, or you could check out The Daily WFT). You think the new contract with an large existing multinational company and world-wide installed Java application I could get an improvement. I’ll leave these stories for another time. I don’t want to have to deal with this.

I’ve got 16 half written blogs on cool MySQL features or interesting topics, but I never seem to get the motivation at the end of the day. I just want to know, how do people in IT get away with this. Where is the Pride in people’s work, where is the desire for people to make what they do a better place. Where is:

An opportunity that’s a challenge, including involving hard work and tight time frames, but a job that provides the rewards of job satisfaction for a productive contribution. An importance on quality, emphasis on continued improvement, and goals of simplicity in complex situations while working in a team environment are also necessary.

SHOW STATUS Gotcha

August 27, 2006 by ronald

Well, it’s Sunday night so I will put this down to being the weekend. The background to being caught out is a request I made to my local Users Group mailing list for some information on people’s environments because I wanted to some empirical data analysis without having any more knowledge of the systems.

In summary (without the surrounding fan-fare, I was seeking):

SELECT VERSION();
SHOW STATUS;
SHOW VARIABLES;  // Optional

I was however perplexed why my first data point analysis (Read/Write ratio) using the Status values Com_insert, Com_update, Com_delete and Com_select was not always giving me expected results. In particular, a number of server results showed 0 for values while I knew the results came from working MySQL environments.

So, sanity check with good friend Morgan and I get the response to answer the dilemma SHOW STATUS defaults to session based statistics in 5.0+, there’s a million people it’s caught out, and many bugs about “xyz variable not being incremented”.

So I’m really seeking:

SELECT VERSION();
SHOW GLOBAL STATUS;
SHOW GLOBAL VARIABLES;  // Optional

Well, that’s my MySQL trivia point for the day. MySQL between version 4.1 and 5.0 made a major change with the default value. The MySQL 5.0 Manual also states this.

UPDATE 23 Sep 2006

Following Eric’s Show @&!# status again!, I did then read the Bug #19093, and it was great to see Monty comment (that’s open source for you, enabling even founders to mix with the common folk), and I’d tend to agree to his comments. MySQL has attempted to correct something.

Perhaps the best compromise to the problem is to deprecate and drop the SHOW STATUS command, and make the scope component GLOBAL|SESSION compulsory, so SHOW GLOBAL STATUS or SHOW SESSION STATUS are the only valid commands!

The RAT and the CAT

August 24, 2006 by ronald

No, it’s not a bedtime story, is a serious system’s design concept and I’m amazing that people don’t know about this.
As I mentioned in If you don’t know your data, you don’t know your application I was doing a Java Code Review, and I found a clear case of a much simplier solution. How simple you ask?

Well, without completing the task 100%, I achieved in less then 1 day (and lets say for the argument 1 more day of work), what is being worked on by somebody else for a week, with an estimate of 2 more weeks to complete. So let’s add 50% to my estimate, that’s a total of 3 days verses 15 days. You do the math. and yes that was last week and that task is still being worked on the same way, even with reference to my working code. Not to mention the code is a similiar magnitude of simplicity, and simplicity means cost savings in support, people so quickly forget that.

So what is this RAT and CAT:

RAT – Row At a Time
CAT – Chunk At a Time

This concept is really rather simple, however it’s amazing how implementations never consider this. Let me give you a simple example.

Your importing some external data (a flat file of 5 columns, and is inturn will be stored in 2 tables in a normalised form). So the RAT way of doing things would entail:

Using the language in question, open the file, read line by line, converting into object for reference.
Now for each row read you
- You get the values (A and B), and then do a select to see if this row already exists in one of the normalised tables (calling in X). If it does, you have the surrogate key, else you insert the row and get the surrogate key
- Now you do the same with the next set of values (C, D, E) which reference the normalised values (A and B), inserting into Y)
- And so on and so on.
- Report exceptions line by line if found

For those now laughing, it’s not really funny, it’s sad that programs are written this way. For those that say, but that’s the only way, well welcome to a different, radical and hard to learn approach.

The Cat way of doing things would entail:

Create a temporary table
Bulk load the data into the temporary table via appropiate command, e.g. mysqlimport or LOAD DATA.
Using one select, insert into X rows from temporary table that are not already present
Using one select, insert into Y rows from the temporart table that are not present, joining to X to get the appropiate surrogate key
Report exceptions via one select of data that wasn’t inserted due to some rules

It’s not rocket science people.

I should mention this is a simple example, and it’s not always possible to do CAT processing for all operations, but generally some portion of batch work can be, and the remaining must be done in a RAT way.

Updated
Some more recent articles including We need more CATs (2009) and The Art of Elimination (2010)

If you don't know your data, you don't know your application.

August 24, 2006 by ronald

The art of data modelling is definitely lost on some [most] people, or they never found it, even though they think they did. Over dinner with good friend Morgan last night we were swapping present stories on the topic.

Morgan wrote recently about I want my 4 bytes back damn it., and interesting example storing an ISBN. Further reference can be found at Getting started with MySQL of a more impractical ISBN example.

Disk is cheap now, so the attitude and poor excuse can be, well a few extra bytes doesn’t matter. Well no! If your a social hacker and have a website with a maximium concurrent connections of 2 maybe, but much like some recent Java Code Reviewing I just performed, just because the system isn’t 24×7, doesn’t give you excuse to be lazy about writing the code not to handle concurrency, thread safety and also as efficient as possible, in this case RAT verses CAT. (I’ll need to write about this, it seemed to go over some of the other professionals even)

I can remember a very specific example some 10 years ago in doing some performance analysis on a site. I’d identified the need for an additional index on a table. Now this table was sized for 200 million rows, and it already had about 70 million. The problem was adding another index required 4GB disk allocation. These things have an effect on sizing, growth and backups.

So the impact on appropiate sizing can clearly have an effect, if it was just one poorly sized column that’s acceptable (just), but normally it’s a pattern that litters a data model.

What’s important to realise is, it’s not just diskspace, it’s also memory. Without really touching on sizing data, I did mention some examples previously in Improving Open Source Databases – WordPress. Here the use of BIGINT(20) for primary keys proved my point. That’s 8 bytes, but unless you have going to have 4 billion categories, it’s a waste. It’s a waste when it’s a foreign key in a table, and it’s a big waste when it’s indexed, and that index is then in memory, and wasting more precious resources.

So how to do identify when the designer of the model has no idea about the intrinsic data value being stored? If you see tables with VARCHAR(255), that’s a clear sign. They have no idea regarding the data, so a default limit is used. Morgan referred to it as “Shooting guns in the dark with your eyes closed”. Books don’t help the cause, I was just skimming High Performance MySQL last night (one of the many freebies from the UC). There on page 82, is a table definition with not one column, but two with varchar(255). Hmmm!

If you see any new applications with VARHAR(255) they are even more lost, because MySQL 5, which has been around quite some time now, supports VARCHAR(65535). Speaking of that, has anybody seen VARCHAR(65535). I’d like to know.

Another example, is in Sheeri’s Top 8 SQL Best Practices Point 4 in regards to storing IP’s effeciently. If you log for example every page hit, this can be your largest table, and moving from varchar(15) to int can save you upto 11 bytes per row alone.

These may just be simple examples, but it’s the principle. When you define a column, consider it’s data, if you don’t know then take the time to do the reasearch and learn.

Pride

August 11, 2006 by ronald

I’ve recently completed a contract and I’ve been in discussions with agents and other employers for further work. Having had one of the worse experiences in my previous work, I’ve been extra careful to ensure what I’m told at the interview/meeting stage is indeed true and accurate (in my last case it was not). I’ve also not made the assumption that an organisation that is dependent on software has placed a certain level of value on what’s in place. (in my last case I did, simply due to the size of the organisation and volume of business).

So, when being asked by people what I’m seeking, outside of the technical skills and compendencies, I’m seeking an organisation that places value on it’s existing software, it’s software quality, it’s software improvement and most importantly it’s software developers. It was unfortunate that for an organisation that lived in software, and would not survive long (especially at the present scale of operations) without it, they met none of these criteria. It was really sad, and overall I found the environment “depressing”.

Now “depressing” could be considered a harsh word, but it comes from both my ethos and also present circumstances in life. I had these two post-it notes on my wall at work, which summed up “What I was seeking/Where I wanted to be”, and “Where I was” in my job.

There have been a few articles recently, it was probably Marcus in Supporting ideas and being productive, based on earlier articles of How to Come Up With Ideas and How To Kill Good Ideas by Zack that motivated me to publish this. These articles have many good points, and while I could put my slant on it, I wanted to focus on just one thing.

In Software Development 10 years ago I was driven, now I know and I’m driven by passion. I always strive for something better then their presently is, seeking better quality, and better productivity, and most importantly today, simplicity.

Everybody is at a different position in life and it’s important to find the right fit. For me money is not a motivator, being productive, making what I’m doing better, where I’m working more enjoyable and productive, and who I’m working with a better place to be are important to me.

I’ve choose the word for the moment to best represent this as PRIDE.

Pride is:

A sense of one’s own proper dignity or value; self-respect
Pleasure or satisfaction taken in an achievement, possession, or association
A cause or source of pleasure or satisfaction; the best of a group or class
The most successful or thriving condition
To indulge (oneself) in a feeling of pleasure or satisfaction

For me, I’m seeking an organisation that takes pride in it’s people, it’s products and it’s future direction, and that attracts people that have pride in what they do, they way they think and how they interact with others.

Compiling MySQL Tutorial 2 – Directly from the source

August 6, 2006 by ronald

Should you want to be on the bleeding edge, or in my case, don’t want to download 70MB each day in a daily snapshot (especially when I’m getting build errors), you can use Bit Keeper Free Bit Keeper Client that at least lets you download the MySQL Repository. This client doesn’t allow commits, which is a good thing for those non-gurus in mysql internals (which definitely includes me).

wget http://www.bitmover.com/bk-client.shar
/bin/sh bk-client.shar
cd bk_client-1.1
make

By placing sfioball in your path you can execute.

sfioball bk://mysql.bkbits.net/mysql-5.1 mysql-5.1

This took me about 4 mins, which seemed much quicker then getting a snapshot!

You can then get cracking with my instructions at Compiling MySQL Tutorial 1 – The Baseline.

A good reference in all this compiling is to take a good look at the MySQL Internals Manual. (which I only found out about recently)

If at a later time you want to update your repository to the latest, use the following command.

update bk://mysql.bkbits.net/mysql-5.1 mysql-5.1

Documentation References

5.1 Reference Manual – 2.9. MySQL Installation Using a Source Distribution
5.1 Reference Manual – 2.9.3. Installing from the Development Source Tree
5.1 Reference Manual – 2.13.1.3. Linux Source Distribution Notes
MySQL Internals Manual

Requirements for compiling

To confirm earlier notes on minimum requirements for compiling the following details should be confirmed.

automake --version
autoconf --version
libtool --version
m4 --version
gcc --version
gmake --version
bison -version

My results running Fedora Core 5.

$ automake --version
automake (GNU automake) 1.9.6
$ autoconf --version
autoconf (GNU Autoconf) 2.59
$ libtool --version
ltmain.sh (GNU libtool) 1.5.22 (1.1220.2.365 2005/12/18 22:14:06)
$ m4 --version
GNU M4 1.4.4
$ gcc --version
gcc (GCC) 4.1.1 20060525 (Red Hat 4.1.1-1)
$ gmake --version
GNU Make 3.80
$ bison -version
bison (GNU Bison) 2.1

Compiling MySQL Tutorial 1 – The Baseline – Update

August 5, 2006 by ronald

Just to confirm my earlier confusion about verified snapshots at Compiling MySQL Tutorial 1 – The Baseline.

“Daily snapshot sources are only published, if they compiled successfully (using the BUILD/compile-pentium-debug-max script) and passed the test suite (using make test). If the source tree snapshot fails to compile or does not pass the test suite, the source tarball will not be published.”

Seems the fine print at MySQL Database Server 5.1: Beta snapshots also states this. Well, need to take my RTFM pill there.

Thanks to Lenz for putting the record straight, and helping with my Forum Post. Seems I did uncover a Bug, now recorded as Bug #21463. Just need to get it fixed to continue on my tutorial.

Compiling MySQL Tutorial 1 – The Baseline

August 2, 2006 by ronald

Pre-requisites

This tutorial is aimed at Linux installations that has the standard development tools already installed. The INSTALL file in the source archives provides good details of the software required (e.g. gunzip, gcc 2.95.2+,make).

Create a separate user for this development work.

su -
useradd mysqldev
su - mysqldev

Get an appropiate Snapshot

You can view recent snapshots at http://downloads.mysql.com/snapshots/mysql-5.1/.

The official statement on snapshots from MySQL AB.
Daily snapshot sources are only published, if they compiled successfully (using the BUILD/compile-pentium-debug-max script) and passed the test suite (using make test). If the source tree snapshot fails to compile or does not pass the test suite, the source tarball will not be published.

At the time of producing this the present snapshot was http://downloads.mysql.com/snapshots/mysql-5.1/mysql-5.1.12-beta-nightly-20060801.tar.gz

mkdir -p $HOME/src
cd $HOME/src
SNAPSHOT="mysql-5.1.12-beta-nightly-20060801";export SNAPSHOT
wget http://downloads.mysql.com/snapshots/mysql-5.1/${SNAPSHOT}.tar.gz
tar xfz ${SNAPSHOT}.tar.gz
cd $SNAPSHOT

Compiling

You can for example do the simple way with most C programs:

DEPLOY=${HOME}/deploy; export DEPLOY
./configure --prefix=${DEPLOY}
make
make install

However it’s recommended you use the significant number of pre-configured build scripts found in the BUILD directory. For Example:

./BUILD/compile-pentium-debug --prefix=${DEPLOY}

I always get the following warnings. Not sure what to do about it, but it doesn’t break any future work to date. Surely somebody in the development team knows about them.

../build_win32/include.tcl: no such file or directory
cp: cannot create regular file `../test/logtrack.list': No such file or directory

Testing

The easy, but time consuming part over, let’s test this new beast.

make install

This will deploy to the preconfigured area of $USER/deploy. For multiple versions within this process you should adjust this back in the compile process.

cd $DEPLOY
bin/mysql_install_db
bin/mysqld_safe &
bin/mysql -e "SELECT VERSION()"
bin/mysqladmin -uroot shutdown

Changes

So if you haven’t already sniffed around there are some reasonable changes in the structure. Here are a few points that caught me out:

mysql_install_db is now under /bin not /scripts, infact there is no /scripts
mysqld is not under /bin but infact under /libexec. A good reason to always used mysqld_safe
by default the data directory is /var, normal documentation etc always has this as /data

This is a quick confirm it all works, and I’ve for the purposes of this example ensured that no other MySQL installation is running, and there is no default /etc/my.cnf file. (I always place this in the MySQL installation directory anyway).

Some minonr considerations for improvements with locations, users and multiple instances.

cd $DEPLOY
rm -rf var
bin/mysql_install_db --datadir=${DEPLOY}/data --user=$USER
bin/mysqld_safe --basedir=${DEPLOY} --datadir=${DEPLOY}/data --user=$USER --port=3307 --socket=/tmp/mysqldev.sock &
bin/mysql -P3307 -S/tmp/mysqldev.sock -e "SELECT VERSION()"
bin/mysqladmin -P3307 -S/tmp/mysqldev.sock  -uroot shutdown

Troubleshooting

Now, there is no guarantee that the snapshot is a working one, in the case of mysql-5.1.12-beta-nightly-20060801 in this example, it didn’t work for me? I’ve just reinstalled my OS to Fedora Core 5, and the previous source version (a 5.1.10 snapshot worked ok)
Here are some tips to help out.

The preconfigured BUILD scripts have a nice display option with -n that allows you to see what will happen.

./BUILD/compile-pentium-debug --prefix=${DEPLOY} -n

In my case it gave me this:

testing pentium4 ... ok
gmake -k distclean || true
/bin/rm -rf */.deps/*.P config.cache storage/innobase/config.cache
   storage/bdb/build_unix/config.cache bdb/dist/autom4te.cache autom4te.cache innobase/autom4te.cache;

path=./BUILD
. "./BUILD/autorun.sh"
CC="gcc" CFLAGS="-Wimplicit -Wreturn-type -Wswitch -Wtrigraphs -Wcomment -W -Wchar-subscripts -Wformat -Wparentheses -Wsign-compare -Wwrite-strings -Wunused
-DUNIV_MUST_NOT_INLINE -DEXTRA_DEBUG -DFORCE_INIT_OF_VARS  -DSAFEMALLOC -DPEDANTIC_SAFEMALLOC -DSAFE_MUTEX"
CXX="gcc" CXXFLAGS="-Wimplicit -Wreturn-type -Wswitch -Wtrigraphs -Wcomment -W -Wchar-subscripts -Wformat -Wparentheses -Wsign-compare -Wwrite-strings -Wctor-dtor-privacy
-Wnon-virtual-dtor -felide-constructors -fno-exceptions -fno-rtti  -DUNIV_MUST_NOT_INLINE -DEXTRA_DEBUG -DFORCE_INIT_OF_VARS
-DSAFEMALLOC -DPEDANTIC_SAFEMALLOC -DSAFE_MUTEX" CXXLDFLAGS=""
./configure --prefix=/home/mysqldev/deploy --enable-assembler  --with-extra-charsets=complex  --enable-thread-safe-client
--with-readline  --with-big-tables  --with-debug=full --enable-local-infile
gmake -j 4

In my case this helped with my problem, that is the autorun.sh command is throwing an error in the Innodb configuration.
Part of the compiling is you can turn Innodb off with –without-innodb, but the BUILD scripts don’t accept parameters, so it’s a matter of ignoring the autorun.sh command above, and adding –without-innodb to the configure.

As it turned out, the autorun.sh provided the additional storage engine support.

Details of the problem in this instance.

[mysqldev@lamda mysql-5.1.12-beta-nightly-20060801]$ . "./BUILD/autorun.sh"
Updating Berkeley DB source tree permissions...
../build_win32/include.tcl: no such file or directory
Updating Berkeley DB README file...
Building ../README
autoconf: building aclocal.m4...
autoconf: running autoheader to build config.hin...
autoconf: running autoconf to build configure
Building ../test/logtrack.list
cp: cannot create regular file `../test/logtrack.list': No such file or directory
Building ../build_win32/db.h
configure.in:723: required file `zlib/Makefile.in' not found
Can't execute automake

The good news is today, the MySQL Barcamp was announced, a great place to get some better insite into the gurus of the MySQL internals. At least a place to ask these questions providing they have a beginner group.

My Next Tutorial will take a look at the example provided by Brian Aker at the recent MySQL Users Conference HackFest B: Creating New SHOW Commands.

MySQL Response to Bugs

July 31, 2006 by ronald

I’ve read at times people complaining about the response to bugs, and people bag the support of MySQL on the forums at times.

Well today I logged a bug, not the first and I’m sure it’s not the last. See LAST_INSERT_ID() does not return results for a problem in the latest Connector/J 5.0.3 that was released just recently.

Now it took me about 2 hours to log the bug, and probably at least 2 hours of frustration prior to that. The initial frustration 2 hours was unsuccessful debugging of what I was sure was valid code (and it was near midnight last night). The second 2 hours today was testing the problem between two environments, different database versions and different Connector/J versions, and providing a simple reproducable case of said problem.

So the timeline of the Bug in the MySQL Bug Tracking System.

[1 Aug 5:40] – Logged Bug
[1 Aug 5:58] – Initial feedback, something for me to confirm, bug “in Progress”
[1 Aug 6:04] – Confirmation that problem is new JDBC-compliant features, and the issue was a backwards-compatible option
[1 Aug 7:14] – Patch submitted to correct the issue
[1 Aug 7:28] – My response to initial feedback, as I hadn’t checked my mail for at least an hour, and then I had to go find the JSTL source code and check

So let me summarise that.

For aS3 (Non-critical) Bug, it took 18 mins from logging for MySQL AB to acknowledge and review the problem, 24 mins from logging to get confirmation of compatibility issue, and 94 mins from logging to have a patch submitted that addressed this and probably some more compatibility issues across 7 source files.

Good Job Mark Matthews. Where do I send the comments for the “MySQL AB employee of the month” award.

Unexplained replication authentication behaviour

July 19, 2006 by ronald

I’m playing with the latest 5.1.11 beta in a master/slave replication situation. Given a lack of H/W lying around I’m configuring a mixed H/W setup to leverage an existing office’s after hours CPUs running Windows XP for my slaves. So here is my test setup.

Server

Linux – 192.168.100.170

The following are the relevent my.cnf settings

[mysqld]
port=3307
server-id = 1
log-bin=mysql-bin

Confirming the server. I did some test commands prior to confirm the position.

$ mysql -uroot -p mysql
Your MySQL connection id is 47 to server version: 5.1.11-beta-log
mysql> show master status;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000001 |     1759 |              |                  |
+------------------+----------+--------------+------------------+

Slave

Windows XP – 192.168.100.171

The following are the relevent my.cnf settings

server-id	= 11
master-host     = 192.168.100.170
master-user     = rep
master-password = dolphin
master-port     = 3307

Test

I’ve taken a few steps back for the documentation, to clearly identify any setup problems for those that are also reviewing this to setup replication.

Starting the Slave, I get the following error.

060720 16:36:22 [ERROR] Slave I/O thread: error connecting to master '[email protected]:3307':
  Error: 'Host '192.168.100.171' is not allowed to connect to this MySQL server'
  errno: 1130  retry-time: 60  retries: 86400

Makes sense, you need to give access to the slave to access the master. So to the uninitiated you may miss that a slave requires different access then a default ‘all’ for example.

On the master.

$ mysql -uroot -p mysql
mysql>  grant replication slave on *.* to rep@'192.168.100.%' identified by 'dolphin';

Just a side note, when defining the master replication you can specifically state databases to include or ignore, but to set up the appropiate grant on a particular database fails. For example:

mysql> grant replication slave on test.* to rep@'192.168.100.%' identified by 'dolphin';
ERROR 1221 (HY000): Incorrect usage of DB GRANT and GLOBAL PRIVILEGES

But the problem was not indeed here, looking closer at the error message, the host and the port matched the my.cnf file, but the user didn’t. It was ‘root’ when it should have been ‘rep’.

So, some advice later from the Planet MySQL #IRC (thanks Sheeri) you find if you RTFM at MySQL 5.1 Manual – Replication Startup Options.

If the master.info file exists when the slave server starts, the server uses its contents and ignores any options that correspond to the values listed in the file. Thus, if you start the slave server with different values of the startup options that correspond to values in the master.info file, the different values have no effect, because the server continues to use the master.info file. To use different values, you must either restart after removing the master.info file or (preferably) use the CHANGE MASTER TO statement to reset the values while the slave is running.

So investigating the master.info file located in the slave data directory I get:

14
mysql-bin.000001
1839
192.168.100.170
root
dolphin
3307
60
0

There is a mismatch between the master.info file and the my.cnf file. I don’t recall the exact series of events when I setup the Windows XP installation, I know I first tested it as a master, and it’s most likely I then configured the slave settings, but used a master-user of ‘root’ initially and started the server. Then thinking this was insecure before I’d created the permissions, I changed the my.cnf file to use the user ‘rep’.

So the lessons today are:

Correctly configure your slave settings before starting the mysql server for the first time.
MySQL records slave replication authentication information in two files, the my.cnf and the master.info.
Making changes to my.cnf are not reflected in an existing master.info. In this situtation use the CHANGE MASTER TO command in addition.
Read the Manual. MySQL does have good documentation, and when it’s not quite right, like I found yesterday with Bug #21134, report it to help everybody in the future.

Now I fully agree this is a Replication 101 error. My excuse is the last replication I setup was in 2003 under 3.23, and it worked perfectly well in a production system for 2 years with a number of 3.23 and 4.0 slaves. Hence while I’m just refreshing my replication skills now.