Watching Replication in action

For all those instant GUI people out there, there is an easy way to watch the present status of your MySQL Slaves using the watch command.

$ watch -n 1 -d "mysql -uroot -pxxxx mysql -e 'SHOW SLAVE STATUS\G'"

The watch provides a view of a file or command, and shows interval updates to this output (-n  seconds> option). You can also specific a granularity better then one second for example 0.5. -d also highlights the differences for you. So while you see the following output with your SHOW SLAVE STATUS, on a loaded system you will also see bin-log and relay-log changes, and perhaps Seconds_Behind_Master.

The question is, Why is Seconds_Behind_Master the last column in this display?

*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: localhost
Master_User: repl
Master_Port: 10002
Connect_Retry: 60
Master_Log_File: master-bin.000006
Read_Master_Log_Pos: 102
Relay_Log_File: newyork-relay-bin.000055
Relay_Log_Pos: 244
Relay_Master_Log_File: master-bin.000006
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_Errno: 0
Skip_Counter: 0
Exec_Master_Log_Pos: 102
Relay_Log_Space: 539
Until_Condition: None
Until_Log_Pos: 0
Master_SSL_Allowed: No
Seconds_Behind_Master: 0

Smarter indexing for column LIKE '%string%'

With my very heavy travel load and skilling load I’ve not had time to scratch myself. It hasn’t stopped the brain working overtime on various issues including the classic find a pattern in a string starting with a wildcard character. On a recent gig I saw the true classic.

SELECT columns
FROM users
WHERE username LIKE '%str%'
OR firstname LIKE '%str%'
OR lastname LIKE '%str%'

I went through the various options and comments on leading ‘%’, OR’s, combined columns, FULLTEXT (which doesn’t work in this case), merge indexing etc, however it perplexed me that nobody has really solved this problem, or at least shared their solutions.

I have an idea, a theory and while I’d love to prove/disprove it, I simply just don’t have the time. So here are my notes, hopefully somebody can comment positively/negatively, do some research, or encourage me to pursue it more.

The Problem

The problem is quite simply the leading wildcard, Der!. So how do you eliminate the wildcard from the search?

My Idea

Working with the earlier example, and having a concatenated field of the three components (username,firstname,lastname), my idea is to encode into 27 bits, a bit for each alphabetic character (A-Z) found, and any non-alphabetic character except whitespace in bit 27.

This column is then indexed and searched for bitwise matches. Here is a pictorial description.

String a b c d e f g h i j k l m n o p q r s t u v w x y z
Ronald Bradford 1 1   1   1           1   1 1     1                
Search: ‘brad’ 1 1   1                           1                 *** MATCH ***
Search: ‘fred’       1 1 1                       1                 NO MATCH

My idea is very simple, the question is, will it actually work.

The Questions

The goal is quite obviously to get an Index utilization, to a maximum say of 10%-25% of matching rows. Factors that would affect this include:

  • If search string is one character, like a vowel I could this not useful, so an implied criteria for optimal work is at least 2 or 3 characters.
  • I see the effectiveness lost on large column values, it could be that only up to say 30 characters is optimal, I see strings of 100+ characters would be ineffective.
  • It doesn’t support case-sensitive searching or non ASCII searching

The Tests

  • Create good sized table and distribution of first/last names (A IMDB Actors table would work)
  • Create a concatenated column of searchable fields (e.g. details = CONCAT_WS(‘ ‘,firstname,lastname)
  • Create function to return bitwised search string (e.g. bitize(str))
  • Create Indexed bitwise column and pre-populate accordingly (e.g. bitfulldetails)
  • Create BEFORE INSERT|BEFORE UPDATE Triggers to populate Indexed bitwised column
  • Test the sucker (of course you will need to include the actual LIKE command as well in the WHERE clause)

A sample query would then be:

SELECT columns
FROM users
WHERE bitize('str') & bitfulldetails
AND fulldetails like '%str%'

So the challenge to all those budding MySQL Gurus, does it seem plausible?

What is the maximum number of colons ':' that may appear in a valid URL?

In idle conversation I was asked by MM.

Question: What is the maximum number of colons ‘:’ that may appear in a valid URL?

* If you said zero to one, then you are victim of browsers, and you have never used anything but a browser.

* If you said one, then your a novice.

* If you said two, then you have probably seen http://host:port at some time.

* If you said three, then you would be correct, the elite.

http://user:[email protected]:port/location

For the record my initial answer was 2.

[email protected]

I’ll be speaking at the upcoming 2007 MySQL Conference & Expo (Why they dropped the word User, who knows), this time with Guy Harrison (Author of many books including MySQL Stored Procedures). We will be talking on MySQL for Oracle DBAs and Developers.

Anyway, good friend Paul McCullagh, creator of PBXT will be also speaking on PrimeBase XT: Design and Implementation of a Transactional Storage Engine. He coined to me in an email “CU at the UC”. I’ve done a further level of refactoring, and added marketing. You can buy the shirt online here. (More colors including black and products coming, if you want it now, please ask).

Using Innodb Primary Keys wisely

At a customer site recently I came across the following table definition for an Innodb table. 140 columns, 3 part primary key totally 44 bytes, 2 million rows giving over 900MB in data size. And it had 15 indexes, totaling over 2.3GB in size. Add into the mix a Windows Server 2003 OS, 640MB innodb_buffer_pool_size setting and table scans out the warzoo. This all leads to very poor performance.

In is generally considered best practice for Innodb to use the shortest primary key as possible, and there is a clear reason for this. Innodb stores with every index the full primary key. So for example if an indexed column was 4 bytes in length, in this example the index row would be 48 bytes (before overhead). Fortunately an easy solution presented itself, because of this index storage requirement, Innodb will create an internal 6 byte primary key if none exists for a table. I of course had known about this but had never tried it in a production situation. I come from old school where every table is defined with a primary key.

So a ALTER TABLE [name] DROP PRIMARY KEY results in a long wait, and a reduction in the Index size to 900MB. WOOT! Now, in closer analysis the Primary Key is the Primary Key because it’s the Unique requirement for the table. No problem I just add a Unique Key to replace the previously dropped Primary Key. A check to review the Index Size showed a size of 2.3GB. What the!

It seems if you read the fine print of the MySQL documentation for Innodb Table Structures there is an exception clause. If you do not define a PRIMARY KEY for your table, MySQL picks the first UNIQUE index that has only NOT NULL columns as the primary key and InnoDB uses it as the clustered index.

In my situation, by adding a Unique Key this was in turn converted internally to the new Primary Key. Drat! So to the Innodb developers out there. I’d like to see a way for the internally generated key to remain in this situation, or at least provide the ability for the designer to choose this capability.

The only result is to physically create an INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY column definition. This may seem like not a big thing, but the customer did not want to make any schema changes. However it was a necessity simply as the first step to get reasonable performance.

This was only the first part of the problem, solving the full table scans via creative Indexing was not possible, code changes were out of the question for an immediate fix. The real problem was the size of the innodb_buffer_pool_size. It was just too small to handle both the table data and index pages in memory. Perfmon analysis showed the machine simply when into intensive disk I/O every time any queries were run. On a Linux system with just a database and Innodb tables, it can be recommended that 70-80% of available memory can be allocated to the innodb_buffer_pool_size.

Alas this is Windows 32 bit, and their is an implied 2GB memory limit for any process, so the best one could manage in this situation was 1600MB.
The long and the short was that even with poor database design, and immediate performance improvement occurred with an optimal Primary Key and sufficient memory allocated to the correct MySQL buffers. This is only a temporary fix for a greater problem.

MySQL Camp T-Shirts

For those that attended the MySQL Camp at Google HQ late last year you may have seen me with my own T-Shirt designs. A number of people inquired about getting them. I’ve finally got around to make them available online, so anybody that would like one can order online.

There are two different shirts. If you want your name on the shirt, you need to make sure you choose the correct one.

  • Early Adopters – For those that were the first 48 that signed up, your name as well as position and company are on the shirt.
  • The Herd – For everybody that registered on the website, your name is on the shirt.

Ok. I’ve already been asked why 48. This was the number of registrants when I got the shirt made back in Australia a week or so before the Camp.

There are also plenty more of my MySQL designs at my MySQL Wear Store.

For those that also liked the runner up pin “A mouse is a waste of a perfectly good hand”, you can also get this in it’s original graphical shirt design at Geek Cool – CLI”

How's your desktop

People that know me, know that I can’t just do one thing at once. People that have seen me work also know just how I can’t just work on a single thing, but work with multiple monitors, desktops and programs everywhere. (Power to my Dell 24″ Widescreen LCD with 1920×1200 resolution). However I must bow when I see another professional of equal skill, however this does boarder on lunacy.

I had to printscreen friend and fellow MySQLer Frank’s desktop, it just defied a sense of normality, yet he worked away quite happily. In some ways I’m also amazed that Windoze actually functioned.

Take a close look at the number of open Firefox windows and putty sessions.

MySQL Predictions for 2007

I’m interested to know what people consider will behold MySQL in 2007?

The announcement of “You” as Time person of the year can only considered a huge boost to the opportunities in 2007. So, in 2007 here are my 7 (in no significant order).

  1. 2007 will be the year of the storage engine. We will see 5 offerings for transactional storage engines, 20+ available and practical engines for management of some form of data.
  2. 2007 will see MySQL 5.1 GA (finally).
  3. 2007 will see MySQL release it’s own Falcon Storage Engine (GA not until Q4 :-().
  4. The MySQL Winter of Code will enable the contributions of the community to change feature development. I foresee a Bounty system from an external party or parties for MySQL Features emerging.
  5. MySQL will make major press inroads to the RDBMS Big 3 of Oracle, SQL Server and IBM DB2.
  6. Despite efforts of MySQL AB, major installations of MySQL 4.0 and 4.1 including large ISP’s will hamper the uptake of 5.0 and 5.1 and the de-commissioning of 4.x
  7. A major country government will make an announcement to move to Open Source across servers and desktops, and MySQL will contribute to being an enterprise database offering in systems replacements as part of a longer term strategy.