SHOW STATUS Gotcha

Well, it’s Sunday night so I will put this down to being the weekend. The background to being caught out is a request I made to my local Users Group mailing list for some information on people’s environments because I wanted to some empirical data analysis without having any more knowledge of the systems.

In summary (without the surrounding fan-fare, I was seeking):

SELECT VERSION();
SHOW STATUS;
SHOW VARIABLES;  // Optional

I was however perplexed why my first data point analysis (Read/Write ratio) using the Status values Com_insert, Com_update, Com_delete and Com_select was not always giving me expected results. In particular, a number of server results showed 0 for values while I knew the results came from working MySQL environments.

So, sanity check with good friend Morgan and I get the response to answer the dilemma SHOW STATUS defaults to session based statistics in 5.0+, there’s a million people it’s caught out, and many bugs about “xyz variable not being incremented”.

Well, it’s Sunday night so I will put this down to being the weekend. The background to being caught out is a request I made to my local Users Group mailing list for some information on people’s environments because I wanted to some empirical data analysis without having any more knowledge of the systems.

So I’m really seeking:

SELECT VERSION();
SHOW GLOBAL STATUS;
SHOW GLOBAL VARIABLES;  // Optional

Well, that’s my MySQL trivia point for the day. MySQL between version 4.1 and 5.0 made a major change with the default value. The MySQL 5.0 Manual also states this.

UPDATE 23 Sep 2006

Following Eric’s Show @&!# status again!, I did then read the Bug #19093, and it was great to see Monty comment (that’s open source for you, enabling even founders to mix with the common folk), and I’d tend to agree to his comments. MySQL has attempted to correct something.

Perhaps the best compromise to the problem is to deprecate and drop the SHOW STATUS command, and make the scope component GLOBAL|SESSION compulsory, so SHOW GLOBAL STATUS or SHOW SESSION STATUS are the only valid commands!

The fast pace of technology in a Web 2.0 world

I had need to goto the Wikipedia this morning to review the terminology of something, and on the front page in Today’s featured article is Mercury. Being a tad curious given I’d heard only on the radio a few hours ago that Pluto was no longer a planet in our Solar System, I drilled down to the bottom to check references to other planets (quicker then searching). So at the bottom I found the following graphic and details of The Solar System Summary.

Well blow me down, they didn’t waste any time there. Pluto is no longer a planet in our Solar System. It is now categorised as a Dwarf Planet.

There are over 100 edits to this page on Dwarf Planet in the past 12 hours, including all the links in Pluto and Solar System correctly referenced.

Now, that’s the power of content by the community, one of the characteristics of Web 2.0.
Of course all of this runs under the LAMP Stack and powered by the MySQL Database. Combined with the fact this is a breed of organisation that didn’t start with large amount of Venture Capital, another trend of the newer generation of popular and successful internet enterprises.

The RAT and the CAT

No, it’s not a bedtime story, is a serious system’s design concept and I’m amazing that people don’t know about this.
As I mentioned in If you don’t know your data, you don’t know your application I was doing a Java Code Review, and I found a clear case of a much simplier solution. How simple you ask?

Well, without completing the task 100%, I achieved in less then 1 day (and lets say for the argument 1 more day of work), what is being worked on by somebody else for a week, with an estimate of 2 more weeks to complete. So let’s add 50% to my estimate, that’s a total of 3 days verses 15 days. You do the math. and yes that was last week and that task is still being worked on the same way, even with reference to my working code. Not to mention the code is a similiar magnitude of simplicity, and simplicity means cost savings in support, people so quickly forget that.

So what is this RAT and CAT:

  • RAT – Row At a Time
  • CAT – Chunk At a Time

This concept is really rather simple, however it’s amazing how implementations never consider this. Let me give you a simple example.

Your importing some external data (a flat file of 5 columns, and is inturn will be stored in 2 tables in a normalised form). So the RAT way of doing things would entail:

  • Using the language in question, open the file, read line by line, converting into object for reference.
  • Now for each row read you
    • You get the values (A and B), and then do a select to see if this row already exists in one of the normalised tables (calling in X). If it does, you have the surrogate key, else you insert the row and get the surrogate key
    • Now you do the same with the next set of values (C, D, E) which reference the normalised values (A and B), inserting into Y)
    • And so on and so on.
    • Report exceptions line by line if found

For those now laughing, it’s not really funny, it’s sad that programs are written this way. For those that say, but that’s the only way, well welcome to a different, radical and hard to learn approach.

The Cat way of doing things would entail:

  • Create a temporary table
  • Bulk load the data into the temporary table via appropiate command, e.g. mysqlimport or LOAD DATA.
  • Using one select, insert into X rows from temporary table that are not already present
  • Using one select, insert into Y rows from the temporart table that are not present, joining to X to get the appropiate surrogate key
  • Report exceptions via one select of data that wasn’t inserted due to some rules

It’s not rocket science people.

I should mention this is a simple example, and it’s not always possible to do CAT processing for all operations, but generally some portion of batch work can be, and the remaining must be done in a RAT way.

Updated
Some more recent articles including We need more CATs (2009) and The Art of Elimination (2010)

If you don't know your data, you don't know your application.

The art of data modelling is definitely lost on some [most] people, or they never found it, even though they think they did. Over dinner with good friend Morgan last night we were swapping present stories on the topic.

Morgan wrote recently about I want my 4 bytes back damn it., and interesting example storing an ISBN. Further reference can be found at Getting started with MySQL of a more impractical ISBN example.

Disk is cheap now, so the attitude and poor excuse can be, well a few extra bytes doesn’t matter. Well no! If your a social hacker and have a website with a maximium concurrent connections of 2 maybe, but much like some recent Java Code Reviewing I just performed, just because the system isn’t 24×7, doesn’t give you excuse to be lazy about writing the code not to handle concurrency, thread safety and also as efficient as possible, in this case RAT verses CAT. (I’ll need to write about this, it seemed to go over some of the other professionals even)

I can remember a very specific example some 10 years ago in doing some performance analysis on a site. I’d identified the need for an additional index on a table. Now this table was sized for 200 million rows, and it already had about 70 million. The problem was adding another index required 4GB disk allocation. These things have an effect on sizing, growth and backups.

So the impact on appropiate sizing can clearly have an effect, if it was just one poorly sized column that’s acceptable (just), but normally it’s a pattern that litters a data model.

What’s important to realise is, it’s not just diskspace, it’s also memory. Without really touching on sizing data, I did mention some examples previously in Improving Open Source Databases – WordPress. Here the use of BIGINT(20) for primary keys proved my point. That’s 8 bytes, but unless you have going to have 4 billion categories, it’s a waste. It’s a waste when it’s a foreign key in a table, and it’s a big waste when it’s indexed, and that index is then in memory, and wasting more precious resources.

So how to do identify when the designer of the model has no idea about the intrinsic data value being stored? If you see tables with VARCHAR(255), that’s a clear sign. They have no idea regarding the data, so a default limit is used. Morgan referred to it as “Shooting guns in the dark with your eyes closed”. Books don’t help the cause, I was just skimming High Performance MySQL last night (one of the many freebies from the UC). There on page 82, is a table definition with not one column, but two with varchar(255). Hmmm!

If you see any new applications with VARHAR(255) they are even more lost, because MySQL 5, which has been around quite some time now, supports VARCHAR(65535). Speaking of that, has anybody seen VARCHAR(65535). I’d like to know.

Another example, is in Sheeri’s Top 8 SQL Best Practices Point 4 in regards to storing IP’s effeciently. If you log for example every page hit, this can be your largest table, and moving from varchar(15) to int can save you upto 11 bytes per row alone.

These may just be simple examples, but it’s the principle. When you define a column, consider it’s data, if you don’t know then take the time to do the reasearch and learn.

Become named in Firefox 2

So, FireFox have come up with a novel idea to promote it’s product. Check out Firefox Day.

The official blurb: Share Firefox with a friend. If your friend downloads Firefox before September 15, you’ll both be immortalized in Firefox 2.

You can even choose how to link your names together on the “Firefox Friends Wall”. Examples like ‘my name’ Informed ‘your name’, or ‘my name’ Empowered ‘your name’, or ‘my name’ Liberated ‘your name’.

Perhaps MySQL can leverage this idea for some what to promote future download!



Pride

I’ve recently completed a contract and I’ve been in discussions with agents and other employers for further work. Having had one of the worse experiences in my previous work, I’ve been extra careful to ensure what I’m told at the interview/meeting stage is indeed true and accurate (in my last case it was not). I’ve also not made the assumption that an organisation that is dependent on software has placed a certain level of value on what’s in place. (in my last case I did, simply due to the size of the organisation and volume of business).

So, when being asked by people what I’m seeking, outside of the technical skills and compendencies, I’m seeking an organisation that places value on it’s existing software, it’s software quality, it’s software improvement and most importantly it’s software developers. It was unfortunate that for an organisation that lived in software, and would not survive long (especially at the present scale of operations) without it, they met none of these criteria. It was really sad, and overall I found the environment “depressing”.

Now “depressing” could be considered a harsh word, but it comes from both my ethos and also present circumstances in life. I had these two post-it notes on my wall at work, which summed up “What I was seeking/Where I wanted to be”, and “Where I was” in my job.

There have been a few articles recently, it was probably Marcus in Supporting ideas and being productive, based on earlier articles of How to Come Up With Ideas and How To Kill Good Ideas by Zack that motivated me to publish this. These articles have many good points, and while I could put my slant on it, I wanted to focus on just one thing.

In Software Development 10 years ago I was driven, now I know and I’m driven by passion. I always strive for something better then their presently is, seeking better quality, and better productivity, and most importantly today, simplicity.

Everybody is at a different position in life and it’s important to find the right fit. For me money is not a motivator, being productive, making what I’m doing better, where I’m working more enjoyable and productive, and who I’m working with a better place to be are important to me.

I’ve choose the word for the moment to best represent this as PRIDE.

Pride is:

  • A sense of one’s own proper dignity or value; self-respect
  • Pleasure or satisfaction taken in an achievement, possession, or association
  • A cause or source of pleasure or satisfaction; the best of a group or class
  • The most successful or thriving condition
  • To indulge (oneself) in a feeling of pleasure or satisfaction

For me, I’m seeking an organisation that takes pride in it’s people, it’s products and it’s future direction, and that attracts people that have pride in what they do, they way they think and how they interact with others.

Compiling MySQL Tutorial 2 – Directly from the source


Should you want to be on the bleeding edge, or in my case, don’t want to download 70MB each day in a daily snapshot (especially when I’m getting build errors), you can use Bit Keeper Free Bit Keeper Client that at least lets you download the MySQL Repository. This client doesn’t allow commits, which is a good thing for those non-gurus in mysql internals (which definitely includes me).

wget http://www.bitmover.com/bk-client.shar
/bin/sh bk-client.shar
cd bk_client-1.1
make

By placing sfioball in your path you can execute.

sfioball bk://mysql.bkbits.net/mysql-5.1 mysql-5.1

This took me about 4 mins, which seemed much quicker then getting a snapshot!

You can then get cracking with my instructions at Compiling MySQL Tutorial 1 – The Baseline.

A good reference in all this compiling is to take a good look at the MySQL Internals Manual. (which I only found out about recently)

If at a later time you want to update your repository to the latest, use the following command.

update bk://mysql.bkbits.net/mysql-5.1 mysql-5.1

Documentation References

5.1 Reference Manual – 2.9. MySQL Installation Using a Source Distribution
5.1 Reference Manual – 2.9.3. Installing from the Development Source Tree
5.1 Reference Manual – 2.13.1.3. Linux Source Distribution Notes
MySQL Internals Manual

Requirements for compiling

To confirm earlier notes on minimum requirements for compiling the following details should be confirmed.

automake --version
autoconf --version
libtool --version
m4 --version
gcc --version
gmake --version
bison -version


My results running Fedora Core 5.

$ automake --version
automake (GNU automake) 1.9.6
$ autoconf --version
autoconf (GNU Autoconf) 2.59
$ libtool --version
ltmain.sh (GNU libtool) 1.5.22 (1.1220.2.365 2005/12/18 22:14:06)
$ m4 --version
GNU M4 1.4.4
$ gcc --version
gcc (GCC) 4.1.1 20060525 (Red Hat 4.1.1-1)
$ gmake --version
GNU Make 3.80
$ bison -version
bison (GNU Bison) 2.1

Compiling MySQL Tutorial 1 – The Baseline – Update

Just to confirm my earlier confusion about verified snapshots at Compiling MySQL Tutorial 1 – The Baseline.

“Daily snapshot sources are only published, if they compiled successfully (using the BUILD/compile-pentium-debug-max script) and passed the test suite (using make test). If the source tree snapshot fails to compile or does not pass the test suite, the source tarball will not be published.”

Seems the fine print at MySQL Database Server 5.1: Beta snapshots also states this. Well, need to take my RTFM pill there.

Thanks to Lenz for putting the record straight, and helping with my Forum Post. Seems I did uncover a Bug, now recorded as Bug #21463. Just need to get it fixed to continue on my tutorial.

Compiling MySQL Tutorial 1 – The Baseline

Pre-requisites

This tutorial is aimed at Linux installations that has the standard development tools already installed. The INSTALL file in the source archives provides good details of the software required (e.g. gunzip, gcc 2.95.2+,make).

Create a separate user for this development work.

su -
useradd mysqldev
su - mysqldev

Get an appropiate Snapshot

You can view recent snapshots at http://downloads.mysql.com/snapshots/mysql-5.1/.

The official statement on snapshots from MySQL AB.
Daily snapshot sources are only published, if they compiled successfully (using the BUILD/compile-pentium-debug-max script) and passed the test suite (using make test). If the source tree snapshot fails to compile or does not pass the test suite, the source tarball will not be published.

At the time of producing this the present snapshot was http://downloads.mysql.com/snapshots/mysql-5.1/mysql-5.1.12-beta-nightly-20060801.tar.gz

mkdir -p $HOME/src
cd $HOME/src
SNAPSHOT="mysql-5.1.12-beta-nightly-20060801";export SNAPSHOT
wget http://downloads.mysql.com/snapshots/mysql-5.1/${SNAPSHOT}.tar.gz
tar xfz ${SNAPSHOT}.tar.gz
cd $SNAPSHOT

Compiling

You can for example do the simple way with most C programs:

DEPLOY=${HOME}/deploy; export DEPLOY
./configure --prefix=${DEPLOY}
make
make install

However it’s recommended you use the significant number of pre-configured build scripts found in the BUILD directory. For Example:

./BUILD/compile-pentium-debug --prefix=${DEPLOY}

I always get the following warnings. Not sure what to do about it, but it doesn’t break any future work to date. Surely somebody in the development team knows about them.

../build_win32/include.tcl: no such file or directory
cp: cannot create regular file `../test/logtrack.list': No such file or directory

Testing

The easy, but time consuming part over, let’s test this new beast.

make install

This will deploy to the preconfigured area of $USER/deploy. For multiple versions within this process you should adjust this back in the compile process.

cd $DEPLOY
bin/mysql_install_db
bin/mysqld_safe &
bin/mysql -e "SELECT VERSION()"
bin/mysqladmin -uroot shutdown

Changes

So if you haven’t already sniffed around there are some reasonable changes in the structure. Here are a few points that caught me out:

  • mysql_install_db is now under /bin not /scripts, infact there is no /scripts
  • mysqld is not under /bin but infact under /libexec. A good reason to always used mysqld_safe
  • by default the data directory is /var, normal documentation etc always has this as /data

This is a quick confirm it all works, and I’ve for the purposes of this example ensured that no other MySQL installation is running, and there is no default /etc/my.cnf file. (I always place this in the MySQL installation directory anyway).

Some minonr considerations for improvements with locations, users and multiple instances.

cd $DEPLOY
rm -rf var
bin/mysql_install_db --datadir=${DEPLOY}/data --user=$USER
bin/mysqld_safe --basedir=${DEPLOY} --datadir=${DEPLOY}/data --user=$USER --port=3307 --socket=/tmp/mysqldev.sock &
bin/mysql -P3307 -S/tmp/mysqldev.sock -e "SELECT VERSION()"
bin/mysqladmin -P3307 -S/tmp/mysqldev.sock  -uroot shutdown

Troubleshooting

Now, there is no guarantee that the snapshot is a working one, in the case of mysql-5.1.12-beta-nightly-20060801 in this example, it didn’t work for me? I’ve just reinstalled my OS to Fedora Core 5, and the previous source version (a 5.1.10 snapshot worked ok)
Here are some tips to help out.

The preconfigured BUILD scripts have a nice display option with -n that allows you to see what will happen.

./BUILD/compile-pentium-debug --prefix=${DEPLOY} -n

In my case it gave me this:

testing pentium4 ... ok
gmake -k distclean || true
/bin/rm -rf */.deps/*.P config.cache storage/innobase/config.cache
   storage/bdb/build_unix/config.cache bdb/dist/autom4te.cache autom4te.cache innobase/autom4te.cache;

path=./BUILD
. "./BUILD/autorun.sh"
CC="gcc" CFLAGS="-Wimplicit -Wreturn-type -Wswitch -Wtrigraphs -Wcomment -W -Wchar-subscripts -Wformat -Wparentheses -Wsign-compare -Wwrite-strings -Wunused
-DUNIV_MUST_NOT_INLINE -DEXTRA_DEBUG -DFORCE_INIT_OF_VARS  -DSAFEMALLOC -DPEDANTIC_SAFEMALLOC -DSAFE_MUTEX"
CXX="gcc" CXXFLAGS="-Wimplicit -Wreturn-type -Wswitch -Wtrigraphs -Wcomment -W -Wchar-subscripts -Wformat -Wparentheses -Wsign-compare -Wwrite-strings -Wctor-dtor-privacy
-Wnon-virtual-dtor -felide-constructors -fno-exceptions -fno-rtti  -DUNIV_MUST_NOT_INLINE -DEXTRA_DEBUG -DFORCE_INIT_OF_VARS
-DSAFEMALLOC -DPEDANTIC_SAFEMALLOC -DSAFE_MUTEX" CXXLDFLAGS=""
./configure --prefix=/home/mysqldev/deploy --enable-assembler  --with-extra-charsets=complex  --enable-thread-safe-client
--with-readline  --with-big-tables  --with-debug=full --enable-local-infile
gmake -j 4

In my case this helped with my problem, that is the autorun.sh command is throwing an error in the Innodb configuration.
Part of the compiling is you can turn Innodb off with –without-innodb, but the BUILD scripts don’t accept parameters, so it’s a matter of ignoring the autorun.sh command above, and adding –without-innodb to the configure.

As it turned out, the autorun.sh provided the additional storage engine support.

Details of the problem in this instance.

[mysqldev@lamda mysql-5.1.12-beta-nightly-20060801]$ . "./BUILD/autorun.sh"
Updating Berkeley DB source tree permissions...
../build_win32/include.tcl: no such file or directory
Updating Berkeley DB README file...
Building ../README
autoconf: building aclocal.m4...
autoconf: running autoheader to build config.hin...
autoconf: running autoconf to build configure
Building ../test/logtrack.list
cp: cannot create regular file `../test/logtrack.list': No such file or directory
Building ../build_win32/db.h
configure.in:723: required file `zlib/Makefile.in' not found
Can't execute automake

The good news is today, the MySQL Barcamp was announced, a great place to get some better insite into the gurus of the MySQL internals. At least a place to ask these questions providing they have a beginner group.

My Next Tutorial will take a look at the example provided by Brian Aker at the recent MySQL Users Conference HackFest B: Creating New SHOW Commands.

MySQL Response to Bugs

I’ve read at times people complaining about the response to bugs, and people bag the support of MySQL on the forums at times.

Well today I logged a bug, not the first and I’m sure it’s not the last. See LAST_INSERT_ID() does not return results for a problem in the latest Connector/J 5.0.3 that was released just recently.

Now it took me about 2 hours to log the bug, and probably at least 2 hours of frustration prior to that. The initial frustration 2 hours was unsuccessful debugging of what I was sure was valid code (and it was near midnight last night). The second 2 hours today was testing the problem between two environments, different database versions and different Connector/J versions, and providing a simple reproducable case of said problem.

So the timeline of the Bug in the MySQL Bug Tracking System.

  • [1 Aug 5:40] – Logged Bug
  • [1 Aug 5:58] – Initial feedback, something for me to confirm, bug “in Progress”
  • [1 Aug 6:04] – Confirmation that problem is new JDBC-compliant features, and the issue was a backwards-compatible option
  • [1 Aug 7:14] – Patch submitted to correct the issue
  • [1 Aug 7:28] – My response to initial feedback, as I hadn’t checked my mail for at least an hour, and then I had to go find the JSTL source code and check

So let me summarise that.

For aS3 (Non-critical) Bug, it took 18 mins from logging for MySQL AB to acknowledge and review the problem, 24 mins from logging to get confirmation of compatibility issue, and 94 mins from logging to have a patch submitted that addressed this and probably some more compatibility issues across 7 source files.

Good Job Mark Matthews. Where do I send the comments for the “MySQL AB employee of the month” award.

Unexplained replication authentication behaviour

I’m playing with the latest 5.1.11 beta in a master/slave replication situation. Given a lack of H/W lying around I’m configuring a mixed H/W setup to leverage an existing office’s after hours CPUs running Windows XP for my slaves. So here is my test setup.

Server

Linux – 192.168.100.170

The following are the relevent my.cnf settings

[mysqld]
port=3307
server-id = 1
log-bin=mysql-bin

Confirming the server. I did some test commands prior to confirm the position.

$ mysql -uroot -p mysql
Your MySQL connection id is 47 to server version: 5.1.11-beta-log
mysql> show master status;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000001 |     1759 |              |                  |
+------------------+----------+--------------+------------------+

Slave

Windows XP – 192.168.100.171

The following are the relevent my.cnf settings

server-id	= 11
master-host     = 192.168.100.170
master-user     = rep
master-password = dolphin
master-port     = 3307

Test

I’ve taken a few steps back for the documentation, to clearly identify any setup problems for those that are also reviewing this to setup replication.

Starting the Slave, I get the following error.

060720 16:36:22 [ERROR] Slave I/O thread: error connecting to master '[email protected]:3307':
  Error: 'Host '192.168.100.171' is not allowed to connect to this MySQL server'
  errno: 1130  retry-time: 60  retries: 86400

Makes sense, you need to give access to the slave to access the master. So to the uninitiated you may miss that a slave requires different access then a default ‘all’ for example.

On the master.

$ mysql -uroot -p mysql
mysql>  grant replication slave on *.* to rep@'192.168.100.%' identified by 'dolphin';

Just a side note, when defining the master replication you can specifically state databases to include or ignore, but to set up the appropiate grant on a particular database fails. For example:

mysql> grant replication slave on test.* to rep@'192.168.100.%' identified by 'dolphin';
ERROR 1221 (HY000): Incorrect usage of DB GRANT and GLOBAL PRIVILEGES

But the problem was not indeed here, looking closer at the error message, the host and the port matched the my.cnf file, but the user didn’t. It was ‘root’ when it should have been ‘rep’.

So, some advice later from the Planet MySQL #IRC (thanks Sheeri) you find if you RTFM at MySQL 5.1 Manual – Replication Startup Options.

If the master.info file exists when the slave server starts, the server uses its contents and ignores any options that correspond to the values listed in the file. Thus, if you start the slave server with different values of the startup options that correspond to values in the master.info file, the different values have no effect, because the server continues to use the master.info file. To use different values, you must either restart after removing the master.info file or (preferably) use the CHANGE MASTER TO statement to reset the values while the slave is running.

So investigating the master.info file located in the slave data directory I get:

14
mysql-bin.000001
1839
192.168.100.170
root
dolphin
3307
60
0

There is a mismatch between the master.info file and the my.cnf file. I don’t recall the exact series of events when I setup the Windows XP installation, I know I first tested it as a master, and it’s most likely I then configured the slave settings, but used a master-user of ‘root’ initially and started the server. Then thinking this was insecure before I’d created the permissions, I changed the my.cnf file to use the user ‘rep’.

So the lessons today are:

  1. Correctly configure your slave settings before starting the mysql server for the first time.
  2. MySQL records slave replication authentication information in two files, the my.cnf and the master.info.
  3. Making changes to my.cnf are not reflected in an existing master.info. In this situtation use the CHANGE MASTER TO command in addition.
  4. Read the Manual. MySQL does have good documentation, and when it’s not quite right, like I found yesterday with Bug #21134, report it to help everybody in the future.

Now I fully agree this is a Replication 101 error. My excuse is the last replication I setup was in 2003 under 3.23, and it worked perfectly well in a production system for 2 years with a number of 3.23 and 4.0 slaves. Hence while I’m just refreshing my replication skills now.

Mercurial Version Control Software

I got asked (being a Java developer) about what was involved in creating an Eclipse Plugin for Mercurial. Well in true Google style, why invent when somebody probably already has. A quick check finds Mercurial Eclipse by VecTrace.

Now until last week, I’d never heard of Mercurial, so this is really an introduction to somebody that has no idea.

What is Mercurial?

Mercurial is a fast, lightweight Source Control Management system designed for efficient handling of very large distributed projects.

Ok, so big deal, I use CVS. I also use Subversion (SVN) for my Apache contributions, and also for MySQL GUI products. Why do we need another Version Control Product? Mercurial is a Distributed Software Configuration Management Tool. The following is from the Mercurial Wiki

A distributed SCM tool is designed to support a model in which each Repository is loosely coupled to many others. Each Repository contains a complete set of metadata describing one or more projects. These repositories may be located almost anywhere. Individual developers only need access to their own repositories, not to a central one, in order to Commit changes.

Distributed SCMs provide mechanisms for propagating changes between repositories.

Distributed SCMs are in contrast to CentralisedSCMs.

So, clearly a distributed model would well in a large distributed organisation where external factors limit continous access to a single central repository. Low Bandwidth, Poor Internet Connectivity, being on a plane, and travelling are all things that would make a distributed model a more ideal solution. I know I’ve taken my laptop away, and being an “Agile Methodology” developer, I commit often. When you have several days of uncommitted work it goes against the normal operation.

You can get more information at the official website at http://www.selenic.com/mercurial. A few quick links are: Quick Start Guide, Tutorial, Glossary.

Installing Mercurial

su -
cd /src
wget http://www.selenic.com/mercurial/release/mercurial-0.9.tar.gz
tar xvfz mercurial-0.9.tar.gz
cd mercurial-0.9
# NOTE: Requires python 2.3 or better.
python -V
python setup.py install --force

A quick check of the syntax.

$ hg
Mercurial Distributed SCM

basic commands (use "hg help" for the full list or option "-v" for details):

 add        add the specified files on the next commit
 annotate   show changeset information per file line
 clone      make a copy of an existing repository
 commit     commit the specified files or all outstanding changes
 diff       diff repository (or selected files)
 export     dump the header and diffs for one or more changesets
 init       create a new repository in the given directory
 log        show revision history of entire repository or files
 parents    show the parents of the working dir or revision
 pull       pull changes from the specified source
 push       push changes to the specified destination
 remove     remove the specified files on the next commit
 revert     revert files or dirs to their states as of some revision
 serve      export the repository via HTTP
 status     show changed files in the working directory
 update     update or merge working directory

A more detailed list:

$ hg help
Mercurial Distributed SCM

list of commands (use "hg help -v" to show aliases and global options):

 add        add the specified files on the next commit
 annotate   show changeset information per file line
 archive    create unversioned archive of a repository revision
 backout    reverse effect of earlier changeset
 bundle     create a changegroup file
 cat        output the latest or given revisions of files
 clone      make a copy of an existing repository
 commit     commit the specified files or all outstanding changes
 copy       mark files as copied for the next commit
 diff       diff repository (or selected files)
 export     dump the header and diffs for one or more changesets
 grep       search for a pattern in specified files and revisions
 heads      show current repository heads
 help       show help for a given command or all commands
 identify   print information about the working copy
 import     import an ordered set of patches
 incoming   show new changesets found in source
 init       create a new repository in the given directory
 locate     locate files matching specific patterns
 log        show revision history of entire repository or files
 manifest   output the latest or given revision of the project manifest
 merge      Merge working directory with another revision
 outgoing   show changesets not found in destination
 parents    show the parents of the working dir or revision
 paths      show definition of symbolic path names
 pull       pull changes from the specified source
 push       push changes to the specified destination
 recover    roll back an interrupted transaction
 remove     remove the specified files on the next commit
 rename     rename files; equivalent of copy + remove
 revert     revert files or dirs to their states as of some revision
 rollback   roll back the last transaction in this repository
 root       print the root (top) of the current working dir
 serve      export the repository via HTTP
 status     show changed files in the working directory
 tag        add a tag for the current tip or a given revision
 tags       list repository tags
 tip        show the tip revision
 unbundle   apply a changegroup file
 update     update or merge working directory
 verify     verify the integrity of the repository
 version    output version and copyright information

Mercurial Eclipse Plugin

The plugin is still in it’s early days, but the “FREEDOM” of open source enables me to easily review. After a quick install and review of docs, I shot off an email to the developer, stating why I was looking, and while I have other projects on the go, I asked what I could do to help. It’s only be 2 days and we have already communicated via email several times on various topics. That’s one reason why I really love the Open Source Community. Generally people are very receptive to feedback, comments and especially help.

Within Eclipse

  • Help ->Software Updates-> Find and install…
  • Select “Search for new features to install”, click Next
  • Click “New Remote site…”
  • Enter following details and click Ok
    • Name: MercurialEclipse Beta site
    • URL: http://zingo.homeip.net:8000/eclipse-betaupdate/
  • Follow the prompts to accept the license and download.

So now with Eclipse, on a project you can simply go Right Click -> Team -> Share Project -> Select Mercurial

A Quick Mercurial Tutorial

Of course the quickest way to learn about using Mercurial is to look at an existing product. So taking this plugin project for a spin.

$ cd /tmp
$ hg clone http://zingo.homeip.net:8000/hg/mercurialeclipse com.vectrace.MercurialEclipse
$ hg clone com.vectrace.MercurialEclipse example
$ cd example
# Create some new dummy files
$ touch test.txt
$ touch html/test.html
# View files against respository status
$ hg status
? html/test.html
? test.txt
# Add the new files
$ hg add
adding html/test.html
adding test.txt
# Commit changes
$ hg commit -m "Testing Mercurial"

So other then the second clone command (which enabled me to not mess up the original repository and to test distributed handling next), this is just the same as CVS (checkout, diff, add, commit)

# The Distributed nature involves first Pulling from the "upstream" respository
$ hg pull ../com.vectrace.MercurialEclipse
pulling from ../com.vectrace.MercurialEclipse
searching for changes
no changes found
# Confirm our new file is not in "upstream" respository
$ ls ../com.vectrace.MercurialEclipse/test.txt
ls: ../com.vectrace.MercurialEclipse/test.txt: No such file or directory
# Push local respository changes to "upstream" respository
$ hg push ../com.vectrace.MercurialEclipse
pushing to ../com.vectrace.MercurialEclipse
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 2 changes to 2 files
$ ls ../com.vectrace.MercurialEclipse/test.txt
ls: ../com.vectrace.MercurialEclipse/test.txt: No such file or directory

Hmmm, missed something here. The Quick Start Guide docs seems to want to work back into the original respository, pulling in changes from the “downstream” repository, then executing a merge and commit command. I don’t want to do this just in case I start messing up the original repository. Time for me to create a new standalone repository so I don’t screw anything up. Stay tuned for an updated tutorial.

Of course this is very basic, I need to look into commands used in CVS like login, tagging (versioning), branching, diffing revisions and merging but this is a start.

I do have a concern with the number of distributed respositories, when and how do you know you are in sync with the “master”, indeed what is the “master”. It does merit some investigation to see what management is in place, like identifying all respositories, and comparing for example.

Conclusion

Ok, so now I’ve got a grasp on Mercurial, time to review the Java Code and see what works and what doesn’t in the Eclipse environment. Of course I don’t use Mercurial so what I may consider as functionality required, may be lower priority to those users out there. Any feedback I’m sure will be most appreciated by the original developer.

What is software quality?

Greg Lehey wrote today Is MySQL getting buggier?. The underlying question of his comments is a more fundamental and passionate topic, and especially for me. That is “Software Quality”.

The quintessential question is this. “How do you determine the ‘software quality’ of a product?” And then quickly followed by, “How do you benchmark this with other software products?”

The short answer to second question is simple. You can’t. The reasons why become apparent in addressing the first question. (There’s a mathematical term for this two question situation, another one of the million things to research and remember one day).

15 years ago as part of my masters research I worked on “Improving Software Quality and Software Productivity”. At the time when I started, I found that these were generally considered mutually exclusive. Quite simply, to improve software quality was to decrease productivity, and when you had to improve productivity, quality declined. Unless you had a clear end goal (which you can’t in software, features, cost, satisfaction etc.), it was impossible to measure the total “cost”. It was also impossible to clearly determine impact on key factors, as with software you never have a “control group” working in isolation of other groups trying different things, and being able to do imperial comparison analysis.

In summary, I found that to improve software quality and software productivity you had to reconsider the approach at several different levels.

  1. You had to strive to work towards a simpler business solution, rather then a more complex solution.
  2. You had to leverage converting business knowledge that was held in lengthy and often dated paper documents, into electronically managable content. Some of the benefits included concurrency, consistency and comparative analysis of the knowledge. The early days of CASE repositories were built on this principle, again before they became to complex.
  3. You also had to look at better ways of writing code, and the area I focussed on was code generation. That is “Code writing Code”. (I have a very successful story of 400,000-500,000 Lines of C code. The ODT was a specialised intelligent peer-to-peer selective replication, between a central repository and 32 distributed sites, allowing for independent heterogeneous operations. It provided levels of complexity within replication including two-way master of partial database data, selective row replication between nodes depending on business needs, and collision management. And I did this is1992. A story for another day).

I also found in the business and industry sectors of research, it was near impossible to get a commonality of some components, a higher level of re-use, a higher value of cost savings. There were many factors for this, another topic for another time.

Now that was a long time ago, and a number of things have changed greatly, including my views. These are far more prevalent regarding Open Source.

Some historical general principles in measurement don’t apply today in Open Source. I remember reading recently in the browser wars, that FireFox was “too honest”. Being more open to acknowledge problems/bugs and correcting them, then the Microsoft perspective. How do you compare quality in this situation. (Side Note: I wish there was a search engine that would search all pages that you had viewed, bookmarked etc., rather then searching the WWW for everything. Would make life much easier in finding references like this.)

Open Source does something commercial software can’t do. If you find a bug, and you can’t wait, then fix it yourself, or at worse pay somebody to fix it for you. The scope, functionality and rules are constantly changing, the resourcing and priorities as well. How do you adequately measure something so fluid.

I am a proponent of “Agile Development Methodologies”, having use for many years eXtreme Programming (XP) either rather seriously, or taking small attributes and applying to existing infrastructures. Terms include Test Driven Development (TDD), Feature Driven Development (FDD), Code Coverage, Unit Tests.

Speaking just on Unit Tests. You write a Unit Test before you write your code. It helps to understand what you are trying to achieve before you try to achieve it. It also provides coverage for when the rules change, you write new or changed tests, which allows you to have analysis performed by your code to address problems. I’ve found that writing extensive tests become a self documentation system of the functionality, it highlights what can and can’t occur. Simply ‘grep’ing can offer a clear summarised description of code, even if you have now knowledge of it. In Java you write long method names (some abstract examples would be ‘testFunctionDoesThisAndNotThis”, “testFunctionAcceptsTwoOfThese”, “testFunctionFailsWithValuesOfThis” etc.) I need to provide a more concrete example to better explain.

An Agile approach, also tends to better encapsulate the Web 2.0 principle of considering software as a service, and not a product. This entirely changes the general thought of software development, of the large cost of software development, followed by a maintenance period and cost of the product, followed by a review of potential new products as circumstances change, followed by … (you get the picture)

If you consider an acceptable ongoing cost for providing a service, perhaps pegged against other business revenue/expenses, and then your product was managed by what available resources you had (time, money, people), rather then by functionality requirements you entire perspective changes. You include a high level of user ownership, the release often principle, and the capacity to review and adapt to changing and improving technology and requirements, all these factors change the outcome on how software is development, managed and perceived.

The other thing about all this is, It’s common sense. It’s a simpler approach, and if everybody considered “How can I make this simpler?” first, it would change they way software is developed and perceived.

So getting back onto the topic of Software Quality. I found that in the right circumstances (and there are wrong circumstances), imploring a change in principle towards Agile concepts can have a significant effect on Software Quality. But, Beauty is in the eyes of the beholder. Software Quality is also much the same. What may be acceptable for somebody is unacceptable for somebody else. How then do you measure this?

So after all this, we are left with the question. “How do you determine the ‘software quality’ of a product?” Well, we can only consider a single product in isolation, and we are talking the MySQL server. I should note that you can’t consider all MySQL products in one analysis, different factors including the relative age of the product, the technologies etc. all affect the measurement. You need to consider each product individually.

So why is the number of bugs, or number of open bugs a measure of software quality. What about how big is the product (e.g. lines of code), how complex is the product, how mature is the product, how tested is the product. What we considered the lifespan of critical bugs as a measurement? So what if there were a larger number of critical bugs, but they were fixed immediately is that a good or poor reflection on software quality. If software is in the Web 2.0 “Perpetual Beta”, then are bugs really incomplete functionality.

In software and for organisations, I think an approach could be similar as mentioned by a prospective employer recently. “We know things aren’t the best they could be, but they are better then they were. If our goal could be to improve just 1% each day, then we would be easily 100% better in 6 months.” So with whatever metrics were in place (and you have to have some form of measurement), if there is always a continual improvement, then software quality is always improving.

Differences in syntax between mysql and mysqltest

As I wrote earlier in Using the MySQL Test Suite I found an issue with using the current MySQL Sakila Sample Database as a test with mysqltest.

I was running an older version of 5.1.7 beta so I figured the best course of action was to upgrade to 5.1.11 beta.

Well the problem still exists, and I found that the cause was due to the syntax of the DELIMITER command. Assuming the creation of the schema tables, here is an example of what I found.

Running in an interactive mysql session the following works

DROP TRIGGER ins_film;
DELIMITER ;;
CREATE TRIGGER `ins_film` AFTER INSERT ON `film` FOR EACH ROW BEGIN
    INSERT INTO film_text (film_id, title, description)
        VALUES (new.film_id, new.title, new.description);
  END;;
DELIMITER ;

On a side note, why does DROP TRIGGER not contain the IF EXISTS syntax like DROP DATABASE and DROP TABLE. It’s a shame, it’s a nice SQL extension that MySQL provides.

Now running the same SQL in a mysqltest test (with appropiate create tables) produces the following error

TEST                            RESULT
-------------------------------------------------------
sakila-trigger                 [ fail ]

Errors are (from /opt/mysql/mysql-test/var/log/mysqltest-time) :
mysqltest: At line 53: Extra delimiter "" found
(the last lines may be the most important ones)

It seems, that the DELIMITER command within mysql accepts a line terminator as a statement terminator, while in mysqltest, this is not so.
The solution is to be more strict in the mysqltest, using the following syntax, note the first and list DELIMITER lines are terminated by the appropiate delimiter in use.

DELIMITER //;

CREATE TRIGGER `ins_film` AFTER INSERT ON `film` FOR EACH ROW BEGIN
  INSERT INTO film_text (film_id, title, description)
  VALUES (new.film_id, new.title, new.description);
END//

DELIMITER ;//

Surprisingly, this syntax does then not work in mysql? Is this a bug? I’m note sure, perhaps somebody could let me know.

Using PBXT 0.9.5

Paul has released Version 0.95 of his PBXT MySQL Transactional Storage Engine.

Here is what I did to get it operational under CentOS 4.3.

su -
useradd pbxt
su - pbxt
wget http://www.primebase.com/xt/download/mysql-4.1.16-pbxt-0.9.5.tar.gz
tar xvfz mysql-4.1.16-pbxt-0.9.5.tar.gz
cd mysql-4.1.16-pbxt-0.9.5
./configure --with-debug=full --with-pbxt-storage-engine --without-query-cache --with-mysqld-user=pbxt  --prefix=/home/pbxt/mysql
make
make install
scripts/mysql_install_db
cd /home/pbxt/mysql
./bin/mysqld_safe --user=pbxt --basedir=/home/pbxt/mysql &

Now, lets test and confirm PBXT is operational.

bin/mysql -uroot mysql
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 1 to server version: 4.1.16-pbxt-0.9.5-debug

Type 'help;' or 'h' for help. Type 'c' to clear the buffer.

mysql> show engines;
+------------+---------+------------------------------------------------------------+
| Engine     | Support | Comment                                                    |
+------------+---------+------------------------------------------------------------+
| MyISAM     | DEFAULT | Default engine as of MySQL 3.23 with great performance     |
| HEAP       | YES     | Alias for MEMORY                                           |
| MEMORY     | YES     | Hash based, stored in memory, useful for temporary tables  |
| MERGE      | YES     | Collection of identical MyISAM tables                      |
| MRG_MYISAM | YES     | Alias for MERGE                                            |
| ISAM       | NO      | Obsolete storage engine, now replaced by MyISAM            |
| MRG_ISAM   | NO      | Obsolete storage engine, now replaced by MERGE             |
| InnoDB     | YES     | Supports transactions, row-level locking, and foreign keys |
| INNOBASE   | YES     | Alias for INNODB                                           |
| BDB        | NO      | Supports transactions and page-level locking               |
| BERKELEYDB | NO      | Alias for BDB                                              |
| NDBCLUSTER | NO      | Clustered, fault-tolerant, memory-based tables             |
| NDB        | NO      | Alias for NDBCLUSTER                                       |
| EXAMPLE    | NO      | Example storage engine                                     |
| ARCHIVE    | NO      | Archive storage engine                                     |
| CSV        | NO      | CSV storage engine                                         |
| BLACKHOLE  | NO      | Storage engine designed to act as null storage             |
| PBXT       | YES     | Super fast transactional, row-level locking engine         |
+------------+---------+------------------------------------------------------------+
18 rows in set (0.00 sec)

Now to take if for a spin with the MySQL Sakila Sample Database. At this time I need a modified version of the schema, as the sample database is designed for Version 5.

Download sakila.mysql4 script.
Download sakila.pbxt.sql script.

bin/mysql -uroot mysql
mysql>  source sakila.mysql4
mysql> source  sakila.pbxt.sql
# NOTE: Alter table of film_text fails as expected due to FULLTEXT index
mysql> show create table actor;
mysql> exit;

A few warnings in my sakila.mysql4 script I’ll need to review, but it looks ok from an installation step. Now back to my tests that I’m writing to use the Sakila Sample Database which I was working on, so I can really test the ACID state and performance of PBXT.

Using the MySQL Test Suite

MySQL provides two different tools to test the MySQL Server with SQL statements. One is mysqltest and in 5.1 mysqlslap. Both of these tools have quite different purposes. This is a quick review of the usage of mysqltest.

Current Usage

Under Linux deploys, the README in the mysql-test directory gives you all the information you need to know to run.

To run the full test suite.

cd /opt/mysql/mysql-test
 ./mysql-test-run

Rather easy, it does take some time, and I was surprised to find a number of tests marked as ‘skipped’. The general purpose of having tests in a product is to provide coverage of software functionality, and tests should always be forward compatible, unless they are no longer applicable and should be removed. I wonder why the time wasn’t taken to ensure they work in each release.

Should you already have a mysqld running, you can also use this using the -extern syntax to run specific tests against the server. For Example.

./mysql-test-run --extern alias analyze

There are over 700 tests, so this approach is not practical for all tests. It is noted in the README that some tests can’t run in extern mode.
Looking a bit closer at both the server configuration and a running test sheds some light on the types of parameters that are used, and how mysqltest operates.

/opt/mysql/bin/mysqld --no-defaults --server-id=1 --basedir=/opt/mysql --port=9306 --port-open-timeout=380 --local-infile
--exit-info=256 --core --datadir=/opt/mysql/mysql-test/var/master-data --pid-file=/opt/mysql/mysql-test/var/run/master.pid
--socket=/opt/mysql/mysql-test/var/tmp/master.sock --log=/opt/mysql/mysql-test/var/log/master.log
 --character-sets-dir=/opt/mysql/share/mysql/charsets --default-character-set=latin1 --tmpdir=/opt/mysql/mysql-test/var/tmp
--language=/opt/mysql/share/mysql/english/ --innodb_data_file_path=ibdata1:128M:autoextend --open-files-limit=1024
--log-bin-trust-function-creators --loose-binlog-show-xid=0 --rpl-recovery-rank=1 --init-rpl-role=master --key_buffer_size=1M
--sort_buffer=256K --max_heap_table_size=1M --log-bin=/opt/mysql/mysql-test/var/log/master-bin --user=root

/opt/mysql/bin/mysqltest --no-defaults --socket=/opt/mysql/mysql-test/var/tmp/master.sock --database=test --user=root
--password= --silent -v --skip-safemalloc --tmpdir=/opt/mysql/mysql-test/var/tmp --port=9306 -R r/count_distinct3.result

Extending MySQL Test Suite

That’s all nice, but the power of this infrastructure is you can incoporate your own tests. This may be beneficial if you have complex statements and use bleeding edge versions of MySQL. Indeed, it’s another approach to being able to provide to MySQL reproducable test cases if you have a problem.

There are basically 2 steps to incoporating your own tests.

  1. Create a test script in the t subdirectory.
  2. Create a test results file in the r subdirectory.

Using the MySQL Sakila Sample Database as a test case, I created a test with the default sakila-schema.sql and sakila-data.sql files. I did need to replace the DROP and USE DATABASE commands and add a set of DROP TABLE statements.

cd /opt/mysql/mysql-test
vi t/sakila.test
./mysql-test-run --record sakila
more r/sakila.results

Example Results

./mysql-test-run sakila

Stopping master cluster
Installing Test Databases
Removing Stale Files
Installing Master Databases
running  ../bin/mysqld --no-defaults --bootstrap --skip-grant-tables     --basedir=.. --datadir=mysql-test/var/master-data --skip-innodb --skip-ndbcluster --skip-bdb
Installing Master Databases 1
running  ../bin/mysqld --no-defaults --bootstrap --skip-grant-tables     --basedir=.. --datadir=mysql-test/var/master-data1 --skip-innodb --skip-ndbcluster --skip-bdb
Installing Slave Databases
running  ../bin/mysqld --no-defaults --bootstrap --skip-grant-tables     --basedir=.. --datadir=mysql-test/var/slave-data --skip-innodb --skip-ndbcluster --skip-bdb
Manager disabled, skipping manager start.
Loading Standard Test Databases
Starting Tests

TEST                            RESULT
-------------------------------------------------------
sakila                         [ pass ]
-------------------------------------------------------

Ending Tests
Shutting-down MySQL daemon

Master shutdown finished
Slave shutdown finished
All 1 tests were successful.

Now, of surprise with my present version of 5.1.7 beta, a number of statements from the sakila-schema failed.
You can download my working version of the sakila.test from above here. The first failure was:

TEST                            RESULT
-------------------------------------------------------
sakila2                        [ fail ]

Errors are (from /opt/mysql/mysql-test/var/log/mysqltest-time) :
mysqltest: At line 220: query ';
CREATE TRIGGER `ins_film` AFTER INSERT ON `film` FOR EACH ROW BEGIN
INSERT INTO film_text (film_id, title, description)
VALUES (new.film_id, new.title, new.description);
END;;
CREATE TRIGGER `upd_film` AFTER UPDATE ON `film` FOR EACH ROW BEGIN
IF (old.title != new.title) or (old.description != new.description)
THEN
UPDATE film_text
SET title=new.title,
description=new.description,
film_id=new.film_id
WHERE film_id=old.film_id;
END IF;
END;;
CREATE TRIGGER `del_film` AFTER DELETE ON `film` FOR EACH ROW BEGIN
DELETE FROM film_text WHERE film_id = old.film_id;
END;;
DELIMITER ;
--' failed: 1065: Query was empty
(the last lines may be the most important ones)

Now is the time to upgrade to the latest 5.1.11 beta to confirm operations and isolate the errors into smaller tests, and hopefully all is well, if not, I’ve got something to contribute back.

Using Cascade in Foreign Keys

For those that are using a Referential Integrity based Storage Engine in MySQL, the use of Foreign Keys can provide an extra level of data integrity within your application. There are of course caveats to this statement, such as correct design, proper transactional use etc, but that’s a topic for another time. The use of CASCADE has an effect on this.

So, just a quick refresher, using the MySQL Sakila Sample Database (if you don’t have it, get it. This can be a valuable tool in many other areas). Let’s first check out the Schema Diagram. We will look at the first relationship in the top left corner of the diagram.

There is a defined relationship with the actor and film_actor tables. An actor record can belong to one or more film_actor records, and a film_actor record must have one corresponding actor record.

Schema Definition

CREATE TABLE actor (
  actor_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
  first_name VARCHAR(45) NOT NULL,
  last_name VARCHAR(45) NOT NULL,
  last_update TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY  (actor_id),
  KEY idx_actor_last_name (last_name)
)ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE film_actor (
  actor_id SMALLINT UNSIGNED NOT NULL,
  film_id SMALLINT UNSIGNED NOT NULL,
  last_update TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY  (actor_id,film_id),
  KEY idx_fk_film_id (`film_id`),
  CONSTRAINT fk_film_actor_actor FOREIGN KEY (actor_id)
         REFERENCES actor (actor_id) ON DELETE RESTRICT ON UPDATE CASCADE,
  CONSTRAINT fk_film_actor_film FOREIGN KEY (film_id)
         REFERENCES film (film_id) ON DELETE RESTRICT ON UPDATE CASCADE
)ENGINE=InnoDB DEFAULT CHARSET=utf8;

Explanation

The above shows the CREATE TABLE syntax as provided in Version 0.8 of the Sakila Sample Database. Looking specifically at the constraint we wish analyse.

  CONSTRAINT fk_film_actor_actor FOREIGN KEY (actor_id)
         REFERENCES actor (actor_id) ON DELETE RESTRICT ON UPDATE CASCADE,

The FOREIGN KEY for a specific column REFERENCES an appropiate table and column, and includes optional UPDATE and DELETE actions.

The ON DELETE RESTRICT is indeed optional as this is the default operation, but is a good practice for both readibility and future compatibility.
The ON UPDATE CASCADE in this case, is indeed unwarranted as the actor primary key is an AUTO_INCREMENT column and should never be updated. By good design definition, a PRIMARY KEY should never be updated in a table, and then should be set to RESTRICT. If however you ever wish to update a primary key value in RESTRICT mode, you can’t as to first UPDATE children before the parent is to set the value to an unknown value that values validation. The best approach here is always use AUTO_INCREMENT primary keys, and never update the values.

Use

So let’s take the current implementation for a spin to see how the default RESTRICTED implementation operates.

mysql> select count(*) from film_actor where actor_id=1;
+----------+
| count(*) |
+----------+
|       19 |
+----------+
1 row in set (0.05 sec)
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)
mysql> delete from actor where actor_id=1;
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails
    (`sakila/film_actor`, CONSTRAINT `fk_film_actor_actor` FOREIGN KEY (`actor_id`) REFERENCES `actor` (`actor_id`)
   ON UPDATE CASCADE)
mysql> rollback;
Query OK, 0 rows affected (0.06 sec)

As you can see, an attempt to delete an actor fails due to a foreign key constraint failure.

The correct implementation for this design is to delete child relationships first. For example.

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)
mysql> delete from film_actor where actor_id=1;
Query OK, 19 rows affected (0.00 sec)
mysql> delete from actor where actor_id=1;
Query OK, 1 row affected (0.00 sec)
mysql> rollback;
Query OK, 0 rows affected (0.06 sec)

Let’s now try CASCADE.

mysql> alter table film_actor drop foreign key fk_film_actor_actor;
Query OK, 5462 rows affected (0.78 sec)
Records: 5462  Duplicates: 0  Warnings: 0
mysql> alter table film_actor add CONSTRAINT `fk_film_actor_actor` FOREIGN KEY (`actor_id`) REFERENCES `actor` (`actor_id`) ON DELETE CASCADE;
Query OK, 5462 rows affected (0.69 sec)
Records: 5462  Duplicates: 0  Warnings: 0
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)
mysql> select count(*) from film_actor where actor_id=1;
+----------+
| count(*) |
+----------+
|       19 |
+----------+
1 row in set (0.01 sec)
mysql> delete from actor where actor_id=1;
Query OK, 1 row affected (0.00 sec)
mysql> select count(*) from film_actor where actor_id=1;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)
mysql> rollback;
Query OK, 0 rows affected (0.00 sec)

As you can see, by changing the definition to CASCADE, the requirement to delete all child rows is nullified.

Considerations

When designing I try to avoid the use of CASCADE. While it has it’s benefits, it is a “catch” by the database, and can hide poor design and coding techniques. In can however in more heavily loaded systems, provide a performance improvement, as additional commands for deleting don’t incur the network communications overhead. I’ll need to add this as a note to my performance testing guidelines to see what effect this has.

Another benefit on a more strict approach is MySQL for example currently does not fire triggers resulting on cascading constraint operations.

Try it yourself

As you can see it’s easy to use. To try it yourself.

  • Use an appropiate storage engine, e.g. InnoDB. You can use ALTER TABLE [table] ENGINE=InnoDB; on your tables.
  • Add an index to the target (foreign key) column, your source (primary key) will have an index by definition of being a primary key.
  • Ensure your source and target columns are the same data type
  • Create the appropiate constraint.

While the MySQL Manual refers to both the source (primary) and target (foreign) columns must have Indexes, I’ve found this not to be the case in 5.1. An ALTER TABLE will auto create the target (foreign) columns index if it doesn’t exist.

Just note, traps for young players. Both the source (primary) and target (foreign) columns must have the identical datatype definition. A common problem I’ve seen is that UNSIGNED is used for a Primary Key definition and not for a foreign key definition.

References
Wikipedia definition of “Referential Integrity”
MySQL Manual Page – FOREIGN KEY Constraints
MySQL Sakila Sample Database Forum
MySQL Forge Wiki on Sakila Sample Database

What constitutes a good error message to the user?

Today, will go down in my professional history as quite possibly the lowest I would ever think of a software developer. I’ve carefully avoided the term “fellow coder”, speaking of a IT industry sticking by fellow IT people, but not today.

I presently support an existing production system, some 1000+ users that’s been in place around 3 years in which I’ve had no prior involvement until recently. Integration with other external parties within the system have provided an esclation in errors being reported in this external communication, and lack of adequate feedback to the user is another topic. Email is the means of reporting administratively of user errors, again another topic of issue. Within these emails, which are almost impossible to manage due to the limited internal GUI only toolset and lack of access to actual email account files to automate scripting (yet another topic? Do you see a trend here), is some relevent information regarding the transaction, and then most importantly the error message. The thing I need to investigate the problem.

The line reads (removing some stuff):


Error Code: 234567892

Ok, well a little cryptic, but surely you can work out from the external system what this means. Investigation of some more errors, in the mail GUI product, yet another series of open windows (you can’t view messages like a regular mail client with a summary list and a detail panel), provides for a trend in the error messages:


Error Code: 1235492567
Error Code: -434783134
Error Code: 34345199

The trend being there is none. Of course today by mid morning the email error count is into the hundreds, and I’m none the wiser. Well time to closely investigate the code management (as I’ve already contacted the external party, and asked if I can provide some error codes to receive greater information).

The following constitutes the two lines of code taken in the determination of the error messages I’ve shown so far. Note, this code takes the external system response, and then “attempts to determine usefull error content for presentation back to the user”.


errorNo = new Random().nextInt();
error = “Error Code: ” + errorNo;

Now while everybody laughed out loud, including fellow developers, DBA, IT manager, the Business Owners and Users (which can’t read code but could understand the first of these two lines I highlighted), and yes it really was funny in context with a bigger picture, but really it wasn’t funny at all. Some things make my blood boil, and this was one of them. With all the time lost between multitudes of people, users, call centre etc, I’d never felt a stronger conviction to hunt down the developer that wrote this.

The end of the story is after even trolling old CVS repository entries I was unable to piece sufficient information to determine the author. Most likely done pre version control, and then that trail leads to only a few names I’ve heard mentioned before.

I’d like to see somebody top that one!

MySQL Ideas

Seems I have over time, thought of many ideas, jotted some notes on some, and even done some work, but everybody knows that “home projects” can take a long time.

Here are a few that have resurfaced over the past month, and I doubt I’ll ever get to them, or perhaps some other enterprising person has already done a similar thing. Of course most are for my own personal/professional gratification, but input from others of “great idea, when do we see it” could sway my interests.

INFORMATION_SCHEMA for MySQL Version 4

Why?

Well, quering the INFORMATION_SCHEMA is very cool, and long overdue for information gathering including statistics, schema definitions, schema version comparision tools etc. Of course there are concerns regarding the performance of using the INFORMATION_SCHEMA, and any design should significantly consider this limitation. The INFORMATION_SCHEMA is not physically represented as a normal MySQL database with tables and real data files, and then can be optimised appropiately, queries reference various sources of information and return the results at query time.

How would I achieve this? Here is an approach. The biggest issue would be what programming technology and non-Linux support.

  • Generate a physical INFORMATION_SCHEMA create SQL script from the most current database version. This would allow you to create a real database with real tables.
  • Population. Well at least some tables will be empty, but some work would be required to the population of other information. A first pass with some easy shell scripting and SHOW commands would easily populate information for SCHEMATA (SHOW DATABASES), TABLES (SHOW TABLES), COLUMNS (DESC [table]). Some are covered by the mysql schema. You get the picture. Of course some information will be more more difficult to populate easily.
  • Syncing. Populating is one thing, but if the information is not current then it becomes useless. Depending on your organisations management, in a controlled environment, simply re-running the population process when schema information is changed, or run nightly may work. A more brut force approach may be to look at filesystem timestamps of the database table definitions and trigger when changes occur, but I suspect this would a poor approach.

Storage Engines in use

I’ve often wondered, do people use X storage engine.

I would love to see plastered on the MySQL Home Page stats of 99,999,999 Downloads, 99,999,999 Installations reporting in, and then some breakdowns of database versions, storage engine usage, even demographics. This type of statistical information can be great, but can also be dangerous to competitors, or can be too detailed to become useful. Consider the objective it a general census.

I recently was introduced to Fotolog, and in the top left corner is Members, Photos, and Photos today. Of course now they have volume the numbers are impressive. You blink, do a refresh and they have changed.

Having a more general information goal approach, rather then detailed would be my guidelines. Here are some points.

  • Each MySQL installation is registered providing it a unique id. This is used for all collection and communication, and there is never any real reference back to organisation/name etc. How this works correctly is a little difficult, but would need to involve some checksuming of information like hostname,os,mysql version,maybe ip.
  • A script runs to generate basic statisical information, including unique installation id, present GMT date/time, MySQL version, master/slave, installation date, number of databases, breakdown of storage engines used per database/installation, total number of tables per database/installation.
  • The Information needs to be recorded, so it can be transmitted, and also used for historical comparision, best option would a flatfile on the filesystem.
  • Having an automated approach to then say email this information to a known server. There would need to be some means to have some authentication, and feedback to confirm success. Using email is not an ideal approach, but is a more readily available medium. Another easy means would be a webservices approach, but would require a more direct internet connection, while mail could overcome this means.
  • More advanced ideas would be to record information in and XML format, have a XML storage engine, and enable reports to be run against this collated information, so the System Adminstrator has at there finger tips historical details that are gathered. The benefit of an XML approach is this can be more easily collated with clever approaches with XSLT for example. The fallback would could be a CSV file, but the information being sent, may well need to be more structured for various reasons.
  • Of course if it was limited to Version 5 up, the benefits of stored procedures, events and UDF’s a lot of this could be implemented in MySQL specific technologies, but what about the bulk of installations pre Version 5, you would definitely what to include these.
  • My concerns would definitely be SPAM abuse, or false figures, so these points alone may nullify the process even being worth it.

This is just a broad stroke overview, I’ve got more detailed analysis of pro’s, con’s, example file layouts, and a means of collating the information and providing dynamic reporting on the server.

Of course there are a few ideas I have implemented, and I’m still passionate about and working on in that “spare time” we all never have.

ShowCase MySQL

A showcase of MySQL (via existing popular Open Source apps). I’ve done this via a VMWare Appliance called UltimateLAMP.

I did this for two reasons, first of course I wanted to easily show applications that could be used without any technical requirements (indeed the hardest part is installing the free VMware player), and second, if there was more penetration of possible MySQL options in my present home city of Brisbane, it may provide some MySQL work opportunities. Of course, this an application approach to demonstrating MySQL, rather then a MySQL features approach, and is dependent then on the community and these applications may be less then ideal in a large scale deployment or may not be of a quality to be placed with MySQL, but it’s one way to get leverage.

I would like some way to poll usage of MySQL within organsiations my home town, but it’s a very difficult approach, and I’m sure some organisations would not even know they are using MySQL. The process may also provide a means of providing a service, say quarterly updates of software (MySQL and open source products), customisations, but I think it would be a difficult sell. Having an army of people applying these ideals across a number of cities may provide a more marketable approach.

ACID Performance Benchmark

Leveraging the Apache project JMeter, which has all the infrastructure for handling theads, reporting, graphing, and volume load testing, I’ve written to enhancements for JMeter JDBC Sampler, the first was CALL support, enabling calls to stored procedures. The second is TRANSACTION support enabling a number of SQL statements to be execute in a true transaction, all pass or all fail.

These allow me to provide an application specific performance approach. Now MySQL has various means of testing, the mysql-test that comes with installations (I’ve got more on this for a later time), as well as mysqlslap (again, some more points on this at a later time). Both of these tools have benefits, but I wanted a more application specific approach. Using JMeter provides an overhead, but it provides a more realistic application approach, my goal is not to get the maximum thoughtput possible, being as close to the screws, but to provide a more realistic approach.

Of course, my objective is to use application specific SQL, the SQL you run, this would need to be gathered via various means. Information gathering including what types of transactions, the interactions, the volumes of use would be needed to create realistic approaches. The benefits of JMeter for example, is I can simulate various transactions, enable them to interact, enable them to have implied delays to simulate user use. These are some of the advantages over the current MySQL performance approaches.

Providing this level of simulation however is only half the requirement. This provides for a benchmark for performance, but what about validation. Sure the system can be used with 100 threads with X Insert Orders, and Y Select Orders, and Z Invoice Prints etc per thread, but what if the there were issues and errors.

Data Quality is a key interest and this level of benchmarking also has to have the extent of validating the results, and taking appropiate actions for failures. I’m planning also that this can be used by enabling Referential Integrity (via an appropiate storage engine) on an existing application and then confirming transactional state of queries within applications that are not presently transaction safe.

Presently, I using this approach to assist in the testing of the MySQL Community Storage Engine PBXT. I’m hoping in conjunction with the Sakila Sample Database and various data sets, that I can also publish comparision results with other Storage Engines.

Schema Designer Analyser

My background is database modelling and large systems design, and I’m writing a A MySQL Guide to Database Modelling, Design and Support (still under development). This is an online and very simple approach for open source development to improve it’s image in database modelling. Keeping things simple will help all those non-professionals out there.

To back up this article, which has a checklist for organisations to follow and then incoporate into standards, is a tool to do the analysis that the checklists requires.

I had hoped to initial develop this using the GRT portion of the MySQL Workbench, but I’ve decided to make that a secondary project for now, focusing on a tehnology of greater understanding, possibly Java. Of course the biggest issues is accurate gathering of schema information, so you can see why a INFORMATION_SCHEMA for MySQL version 4 would come in handy.

Conclusion

I’ve got a lot more ideas, and discussions with other MySQL peers rekind ideas I’ve thought of previously. I really should get to putting them all out there to see how many people think I’m crazy, or to find other crazy people to contribute, or even fund! One can dream.

How I work

My work life is really fragmented at present, so I’ve decided a split approach in answer to Dave Rosenberg’s How I Work–what I have learned so far .

What is your role?
Support Developer providing sole support an internal client web based system (Java, Oracle) with 1000+ users producing >$1m daily revenue. Independent Database Consultant. Specialising in Database Modelling, Large Systems Design and Web Development Technologies. Strong proponent of Agile Development Methodologies (specifically Extreme Programming – XP).
What is your computer setup?
Dell Optiplex GX280 (P4, 1GB, 17″ CRT) running Redhat FC4.
(When I started only 2 months ago I got lumped with a similar speced machine running Windows XP, and upteen windows apps (like 4 different versions of TOAD for example) . This OS was contry to my understanding provided in the interview process.
Primary – Dell Inspiron 5150 (P4/3.2GHz, 1GB, 120GB, 15″ + 21″ CRT)
Secondary – Generic (P3/600MHz, 1GB, lots of HDD ~500GB, 17″ CRT)
Both with very impressive Saitek backlit keyboards (one red, one blue) Great at night. See image at bottom.
What desktop software applications do you use daily?
RedHat Fedora Core 4
Eclipse 3.1, FireFox 1.5, Thunderbird 1.5, WebSphere Application Server 5.1, J2DK 1.4.2, Oracle SQL Developer, Open Office 2, XMMS and SSH client (which I use most)
Due to legacy internal systems and support I also must run under Wine (Internet Exploder), and First Class (email client).
Not to stop there, I also must run under Citrix ICA Client apps (FocalPoint, Heat Call Logging, and Microsoft Word for internal forms that won’t work under OO2.) And before somebody suggests why not try VMWare or other clients, I have tried, but software like Focalpoint can’t install?
CentOS 4.3
FireFox 1.5, ThunderBird 1.5, Gaim, SSH, Skype, Open Office 2.
Maybe not all of these every day, but some combination of each day –> MySQL 4.1, MySQL 5.0, MySQL 5.1, MySQL Workbench, Eclipse 3.1, J2DK 1.4.2, J2DK 5.0, Apache Tomcat 5.0.28, Apache Httpd 2.0.53, JMeter.

Presently also configuring a new laptop drive running Ubuntu Dapper 6.06 RC. For the Record, Beta 2, Flight 6 and Flight 7 all failed.
What websites do you visit every day?
Internal Wiki (all day)
At Lunch Google Sci-Tech News
PlanetMySQL
Google Sci-Tech News
PlanetMySQL (if I haven’t already got to them).
What mobile device or cell phone do you use?
N/A NEC e606 3G phone (which I’ve had for probably 3 years now) operating on a 3G network with video calls, again for 3 years.
Do you use IM?
No. All access is blocked other then an internal Jabber server, that I use rarely, never for communication, just a cut/paste of command or syntax. Speaking of blocked, SSH access to my production servers is blocked, and even reading news like Skypes Call Out is blocked by WebSense. Extensively, however due to current employement policy, I’m very hamstrung unless before/after hours.
I use Gaim as I have AIM, MSN and Google Talk accounts, and Skype. My preference was always AIM, but as clients come and go, I’ve had to accumulate more accounts.
Do you use a VoIP phone?
No. Not at present, however for many years I worked with US clients and used Packet 8. Still have the hard phone somewhere.
I’ve also used Skype talk for one or one or conference calls. Of late in Australia to New Zealand and Singapore. Indeed, the quality to Singapore has been excellent, when living in the US, calls to Singapore on Skype were clearer then my Packet8 phone to US numbers.
Do you have a personal organization/time management theory?
Current contract employees a number of disjointed methods, which in observation just shows so much inefficency, it’s worth documenting just to highlight what not to do. We have daily team meetings (10 mins), each listing your top 2 daily tasks. Weekly we have to also submit weekly goals. Weekly combined meetings with another team where we again give weekly top 2 tasks. We use two seperate systems (with manual double entry) for work identification, one for call centre logging issues, and Bugzilla for software bugs, and enhancements. We also use XPlanner (again duplicating a lot of tasks) for time management.
With all this rigid structure, I am daily given either other work to do, or investigate, and in over 3 months, I would rarely end a week anywhere near where it was so described at the start of the week.
With all the technology possible, I actually do not have any electronic management gadget, never had. I use a combination of notebook, plain paper (usually for daily notes etc, which I discard regularly), a diary, and normally a lot of emails which I normally send to/from home.
Given that email is used so much, I basically use Draft Emails for any electronic notes.

Anything else?
Perhaps there is merit in How I work now, and How I’d like to work now.

Saitek Keyboard

A better VNC

I’ve been using VNCViewer from RealVNC under Linux to remote connect to an older machine running windows. Two reasons, I don’t need yet another screen on my desk, and I need windows to adequately test and use the MySQL GUI products, in particular MySQL Workbench.

I never realised there was a better way, particularly over a local network, a command called rdesktop which is capable of natively speaking Remote Desktop Protocol (RDP).

$ su -
$ yum install rdesktop

This did only give me version 1.3.1. The current version 1.4.1 available from the rdesktop Website does provide greater commands including the syntax I use.

su -
cd /src
wget http://optusnet.dl.sourceforge.net/sourceforge/rdesktop/rdesktop-1.4.1.tar.gz
tar xvfz rdesktop-1.4.1.tar.gz
cd rdesktop-1.4.1
./configure
make
make install

Example Usage

$ rdesktop -u <username> -p <password> -K -N -z -a 16 -x b -P 

Some additional common options include -d <domain> and -f for fullscreen.

Comand Syntax

$ rdesktop
rdesktop: A Remote Desktop Protocol client.
Version 1.4.1. Copyright (C) 1999-2005 Matt Chapman.
See http://www.rdesktop.org/ for more information.

Usage: rdesktop [options] server[:port]
   -u: user name
   -d: domain
   -s: shell
   -c: working directory
   -p: password (- to prompt)
   -n: client hostname
   -k: keyboard layout on server (en-us, de, sv, etc.)
   -g: desktop geometry (WxH)
   -f: full-screen mode
   -b: force bitmap updates
   -L: local codepage
   -B: use BackingStore of X-server (if available)
   -e: disable encryption (French TS)
   -E: disable encryption from client to server
   -m: do not send motion events
   -C: use private colour map
   -D: hide window manager decorations
   -K: keep window manager key bindings
   -S: caption button size (single application mode)
   -T: window title
   -N: enable numlock syncronization
   -X: embed into another window with a given id.
   -a: connection colour depth
   -z: enable rdp compression
   -x: RDP5 experience (m[odem 28.8], b[roadband], l[an] or hex nr.)
   -P: use persistent bitmap caching
   -r: enable specified device redirection (this flag can be repeated)
         '-r comport:COM1=/dev/ttyS0': enable serial redirection of /dev/ttyS0 to COM1
             or      COM1=/dev/ttyS0,COM2=/dev/ttyS1
         '-r disk:floppy=/mnt/floppy': enable redirection of /mnt/floppy to 'floppy' share
             or   'floppy=/mnt/floppy,cdrom=/mnt/cdrom'
         '-r clientname=': Set the client name displayed
             for redirected disks
         '-r lptport:LPT1=/dev/lp0': enable parallel redirection of /dev/lp0 to LPT1
             or      LPT1=/dev/lp0,LPT2=/dev/lp1
         '-r printer:mydeskjet': enable printer redirection
             or      mydeskjet="HP LaserJet IIIP" to enter server driver as well
         '-r sound:[local|off|remote]': enable sound redirection
                     remote would leave sound on server
   -0: attach to console
   -4: use RDP version 4
   -5: use RDP version 5 (default)

I haven’t reviewed the security implications but considering I’m only running in my own internal network, it’s not a major priority.

References

Improving Open Source Databases – WordPress

As part of both my UltimateLAMP side project, and my greater involvement with MySQL Workbench, I’ve been wanting to review and document the database schemas of Open Source products.

Indeed, as part of discussions with Mike Zinner of MySQL AB at the recent MySQL Users Conference, I suggested an idea used in the Java World, for example by Clover and Agitar, where to promote the usefullness of their respective tools (in this case code coverage), they provide results against Open Source Products. For an example check out Clover Examples.

With MySQL Workbench, to get some greater exposure of the use of graphical Data Modelling tool, I mentioned the idea of providing a respository of schemas, may help in exposure, usage and feedback of MySQL Workbench. Indeed my work is towards this being added to MySQL Forge, however I’ve found that at my first indepth work, using WordPress 2.0.2 has been a less then successful experience.

The schema would rate a 2 out of 5 on my scale of optimal database schemas. Here are some of my initial findings.

  • Inconsistent naming standards
    • ID uppercase for primary keys in some tables, but not in others.
    • different standards for primary keys, some id, some [table]_id
    • Inconsistent identification of foreign keys (e.g. post_parent). All keys should have _id.
    • Inconsistent datatypes for optional foreign keys (e.g. default of 0 rather then NULL)
    • Inconsistent column naming standards, some tables prefix columns with table name or table like name, (e.g. post_), but not all columns. Some tables has multiple prefix values (e.g. cat_, category_)
  • Poor usage of datatypes. Examples include BIGINT(20) for primary keys, TEXT for post title, INT(10) for columns which as values a TINYINT can support.
  • Inconsistent use of enums (e.g. Y/N for some values, and yes/no for others)
  • No Referential Integrity support (ie. Foreign Keys), limiting the benefits of a modelling tool showing table relationships
  • Different rules for primary keys (most tables have a suggorate key, but not all)

I am actually working on A MySQL Guide to Database Modelling, Design and Support, which covers a lot of these points and more. I’m some way from completing this document, but this initial research of the WordPress product highlights it’s importance to the open source community.

Of course we can just make noise about what we don’t like, or we can do something about it. I’m submitting my ideas, suggestions and code changes to the project. What is done about it is outside of my control, but I’ve at least given my input to this Open Source Product.

WordPress 2.0.2 Schema SQL (taken from a mysqldump after installation)

Here is a graphical representation of the Data Model from MySQL Workbench.
NOTE: My MySQL Workbench will not enable me to do a Export Image, so I’ll need to add this at a later time

Introducing Referential Integrity

Unfortunately this is no automated means of creating appropiate Referential Integrity for a schema. With more strict naming standards, it would be possible to reconstuct this information generally.

The approach to create these new rules were performed manually, but a tool could be developed to provide approximations for a number of these.

# Implement Foreign Keys. Changes include:
#  - Appropiate Storage Engine
#  - ensure column is of same definition as matching primary key (e.g. datatype + unsigned)
#  - create index for foreign key
#  - ensure optional keys use NULL

ALTER TABLE wp_categories ENGINE=InnoDB;
ALTER TABLE wp_comments ENGINE=InnoDB;
ALTER TABLE wp_linkcategories ENGINE=InnoDB;
ALTER TABLE wp_links ENGINE=InnoDB;
ALTER TABLE wp_options ENGINE=InnoDB;
ALTER TABLE wp_post2cat ENGINE=InnoDB;
ALTER TABLE wp_postmeta ENGINE=InnoDB;
ALTER TABLE wp_posts ENGINE=InnoDB;
ALTER TABLE wp_usermeta ENGINE=InnoDB;
ALTER TABLE wp_users ENGINE=InnoDB;

ALTER TABLE wp_usermeta MODIFY user_id  bigint(20) UNSIGNED NOT NULL DEFAULT '0';
ALTER TABLE wp_usermeta ADD FOREIGN KEY usermeta_users_fk (user_id) REFERENCES wp_users(ID) ON DELETE CASCADE;

ALTER TABLE wp_posts MODIFY post_author BIGINT(20) UNSIGNED NOT NULL;
CREATE INDEX post_author ON wp_posts(post_author);
ALTER TABLE wp_posts ADD FOREIGN KEY posts_users_fk (post_author) REFERENCES wp_users(ID) ON DELETE CASCADE;

ALTER TABLE wp_posts MODIFY post_parent BIGINT(20) UNSIGNED NULL DEFAULT NULL;
UPDATE wp_posts SET post_parent = NULL WHERE post_parent = 0;
CREATE INDEX post_parent ON wp_posts(post_parent);
ALTER TABLE wp_posts ADD FOREIGN KEY posts_posts_fk (post_parent) REFERENCES wp_posts(ID);

ALTER TABLE wp_postmeta MODIFY post_id  bigint(20) UNSIGNED NOT NULL DEFAULT '0';
ALTER TABLE wp_postmeta ADD FOREIGN KEY postmeta_posts_fk (post_id) REFERENCES wp_posts(ID) ON DELETE CASCADE;

ALTER TABLE wp_categories MODIFY cat_ID BIGINT(20) UNSIGNED NULL DEFAULT NULL AUTO_INCREMENT;
ALTER TABLE wp_categories MODIFY category_parent BIGINT(20) UNSIGNED NULL DEFAULT NULL;
UPDATE wp_categories SET category_parent = NULL WHERE category_parent = 0;
ALTER TABLE wp_categories ADD FOREIGN KEY categories_categories_fk (category_parent) REFERENCES wp_categories(cat_ID);

ALTER TABLE wp_post2cat MODIFY rel_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_post2cat MODIFY post_id BIGINT(20) UNSIGNED NOT NULL;
ALTER TABLE wp_post2cat MODIFY category_id BIGINT(20) UNSIGNED NOT NULL;
ALTER TABLE wp_post2cat ADD FOREIGN KEY post2cat_posts_fk (post_id) REFERENCES wp_posts(ID) ON DELETE CASCADE;
ALTER TABLE wp_post2cat ADD FOREIGN KEY post2cat_categories_fk (category_id) REFERENCES wp_categories(cat_ID) ON DELETE CASCADE;
ALTER TABLE wp_comments MODIFY user_id BIGINT(20) UNSIGNED NULL DEFAULT NULL;
UPDATE wp_comments SET user_id = NULL WHERE user_id = 0;
CREATE INDEX user_id ON wp_comments(user_id);
ALTER TABLE wp_comments ADD FOREIGN KEY comments_users_fk (user_id) REFERENCES wp_users(id) ON DELETE CASCADE;
ALTER TABLE wp_comments MODIFY comment_post_ID BIGINT(20) UNSIGNED NULL DEFAULT NULL;
ALTER TABLE wp_comments ADD FOREIGN KEY comments_posts_fk (comment_post_ID) REFERENCES wp_posts(ID) ON DELETE CASCADE;

ALTER TABLE wp_comments MODIFY comment_parent BIGINT(20) UNSIGNED NULL DEFAULT NULL;
UPDATE wp_comments SET comment_parent = NULL WHERE comment_parent = 0;
CREATE INDEX comment_parent ON wp_comments(comment_parent);
ALTER TABLE wp_comments ADD FOREIGN KEY comments_comments_fk (comment_parent) REFERENCES wp_comments(comment_id);

ALTER TABLE wp_linkcategories MODIFY cat_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_links MODIFY link_category BIGINT(20) UNSIGNED NOT NULL;
ALTER TABLE wp_links ADD FOREIGN KEY links_category_fk (link_category) REFERENCES wp_linkcategories(cat_id);

While there are no column name changes, by default the WordPress code should operate with this revised schema. Issues would include a poor delete strategy that still voilates the liberal constraints now enforced. Special handling of 0 for optional columns when the value is now NULL may also be a problem.

Revised Data Model in MySQL Workbench.
NOTE: My MySQL Workbench will not enable me to do a Export Image, so I’ll need to add this at a later time

Introducing improved Standards

Of course, I’d like to see the schema improve, here are my first draft ideas for improvements of the schema.

# Naming Improvements ===================================================================================
# Rename to lowercase id
ALTER TABLE wp_users CHANGE ID id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_posts CHANGE ID id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_categories CHANGE cat_ID cat_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_comments CHANGE comment_ID comment_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_comments CHANGE comment_post_ID comment_post_id BIGINT(20) UNSIGNED NOT NULL;

# Include _id for all Foreign Keys
ALTER TABLE wp_posts CHANGE post_author post_author_id BIGINT(20) UNSIGNED NOT NULL;
ALTER TABLE wp_posts CHANGE post_parent post_parent_id BIGINT(20) UNSIGNED NOT NULL;
ALTER TABLE wp_categories CHANGE category_parent category_parent_id BIGINT(20) UNSIGNED NOT NULL;
ALTER TABLE wp_comments CHANGE comment_parent comment_parent_id BIGINT(20) UNSIGNED NULL DEFAULT NULL;
ALTER TABLE wp_links CHANGE link_category link_category_id BIGINT(20) UNSIGNED NULL DEFAULT NULL;

# Primary Key Name Standardisation
ALTER TABLE wp_posts CHANGE id post_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_users CHANGE id user_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;

# Foreign Key Standardisation
ALTER TABLE wp_categories MODIFY category_parent_id parent_id BIGINT(20) UNSIGNED NOT NULL;
ALTER TABLE wp_comments CHANGE comment_parent_id parent_id BIGINT(20) UNSIGNED NULL DEFAULT NULL;

# Other Column Standaisations
ALTER TABLE wp_categories MODIFY cat_name category_name VARCHAR(55) NOT NULL;

# Column width improvements
ALTER TABLE wp_categories MODIFY category_count MEDIUMINT(5) NOT NULL DEFAULT '0';
ALTER TABLE wp_posts MODIFY post_title VARCHAR(255) NOT NULL;
ALTER TABLE wp_linkcategories MODIFY cat_name VARCHAR(255) NOT NULL;
ALTER TABLE wp_linkcategories MODIFY cat_id TINYINT(3) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_options MODIFY autoload ENUM('Y','N') NOT NULL DEFAULT 'Y';

# Obvious Index Improvements
# Make login a unique index
DROP INDEX user_login_key ON wp_users;
CREATE UNIQUE INDEX  user_login_key  ON wp_users(user_login);

These will require code changes and adequate testing, something I’m not hopeful would happen unless I drove it, but you never know, the next time I review the WordPress Schema I might be impressed.

Conclusion

This is just a broadstroke first review of the schema. As part of all good design, testing of the code, further review and refinement based on better understanding of the product, and also peer review are all important steps not yet undertaken.

Migrating an MyISAM schema to use Referential Integrity

Here are some steps involved. Using the current MySQL defacto engine InnoDB. Of course, Falcon, PBXT and others will enable alternative engines to be used.

Convert Table Storage Engine Types

$ mysql -u[user] -p[password] [database] -e "SHOW TABLES" | grep -v "Tables_in" | sed -e "s/^/ALTER TABLE /" | sed -e "s/$/ ENGINE=InnoDB;/" > upgrade.sql
$ mysql -u[user] -p[password] [database] < upgrade.sql

NOTE: This may not work for all tables, for example those with FULLTEXT indexes will fail.

For the introduction of Referential Integrity we need to ensure the following.

  • Each Foreign Key column should have an index. Prior to 4.1 I think this was a requirement, however it's a good general practice regardless for SQL performance.
  • The datatype must match between Primary Key and Foreign Keys. The most obvious oversight is normally UNSIGNED, however you also for example have INT and INT, and not INT and BIGINT as datatypes.
  • Optional Foreign Keys must contain NULL values and not the normal practice of having a default of 0.

A Sample Foreign Key Syntax.

mysql> ALTER TABLE wp_usermeta ADD FOREIGN KEY usermeta_user_fk (user_id) REFERENCES wp_user(id);
ERROR 1005 (HY000): Can't create table './wordpress/#sql-cd9_10.frm' (errno: 150)

A closer investigation of what this error really is:

$ perror 150
MySQL error code 150: Foreign key constraint is incorrectly formed

A confirmation of table definitions.

mysql> desc wp_usermeta;
+------------+---------------------+------+-----+---------+----------------+
| Field      | Type                | Null | Key | Default | Extra          |
+------------+---------------------+------+-----+---------+----------------+
| umeta_id   | bigint(20)          |      | PRI | NULL    | auto_increment |
| user_id    | bigint(20) unsigned |      | MUL | 0       |                |
| meta_key   | varchar(255)        | YES  | MUL | NULL    |                |
| meta_value | longtext            | YES  |     | NULL    |                |
+------------+---------------------+------+-----+---------+----------------+
4 rows in set (0.00 sec)

 mysql> desc wp_users;
+---------------------+---------------------+------+-----+---------------------+----------------+
| Field               | Type                | Null | Key | Default             | Extra          |
+---------------------+---------------------+------+-----+---------------------+----------------+
| id                  | bigint(20) unsigned |      | PRI | NULL                | auto_increment |
| user_login          | varchar(60)         |      | UNI |                     |                |
| user_pass           | varchar(64)         |      |     |                     |                |
| user_nicename       | varchar(50)         |      |     |                     |                |
| user_email          | varchar(100)        |      |     |                     |                |
| user_url            | varchar(100)        |      |     |                     |                |
| user_registered     | datetime            |      |     | 0000-00-00 00:00:00 |                |
| user_activation_key | varchar(60)         |      |     |                     |                |
| user_status         | int(11)             |      |     | 0                   |                |
| display_name        | varchar(250)        |      |     |                     |                |
+---------------------+---------------------+------+-----+---------------------+----------------+
10 rows in set (0.00 sec)

A second pair of eyes (thanks Jon), showed that I actually spelt a table name wrong. Did you spot it. Of course it would have been nice if the error message actually told me this. This rather bland message Can’t create table could actually mean.

  • missing index (pre 4.1)
  • incompatible columns data types
  • Invalid Table.

I’m sure if I tried to break it I’d find more examples, but just a trap for unsuspecting people.

Now migrating an existing schema to using Referential Integrity provides some initial benefits (row level locking, misplaced key updates/deletes) but it does not provide true integrity unless your application has been written to support transactions. Chances are it hasn’t, but this is at least the first step.

Restyling a Mediwiki Installation – Lesson 1

Following my implementation of UltimateLAMP, read heaps more at this thread, I undertook to provide customizations of a MediaWiki Installation. Here is the first lesson that you can undertake if you wish to beautify the default MediaWiki Installation.

For the purposes of this demonstration, I am going to help out Jay & Colin and propose a restyle the MySQL forge to fit in with the default Home Page. Hopefully you will see it there soon!

Lesson 1 – Updating the default Monobook.css

There are several different ways to make style changes, the simplest is to customize the system wide Monobook.css, and this Lesson will focus on this.

By accessing the link [http://my.wiki.site/]index.php/MediaWiki:Monobook.css you will be able to make the following changes.

The best way to approach this, like any good programming style, make small changes, testing and confirmation and continue.

Note: For all screen prints, click for a larger image

1. Cleanup Backgrounds

body        { background-image: none; background-color: #FFFFFF;}
.portlet    { background-color: #FFFFFF; }

The MediaWiki Page is made up of three sections, these are the represented by styles .portlet, #content, and #footer. For the purposes of our first example, the content section and the footer section are already white.

==>

At this point I should recommend that you use FireFox for future work. You should then install the Web Developer Add-on. What results from this plugin is invaluable information about the internals of a web page. The two options most useful for this exercise is:

  • Information | Display Id and Class Details.
  • Information | Display Element Information (move the cursor around to get information)
==>

2. Cleanup Borders

I don’t feel that borders around things are warranted. I’m more a clean look kinda guy. Remove all borders, say one to separate the footer from the page.

#content    { border-width: 0px; }
.portlet .pBody
            { border-width: 0px; }
#footer     { border-top: 1px solid #888888; border-bottom-width: 0px; }
==>

Maybe, that’s a little too clean. Add some separators on left side options.

#p-navigation,
#p-search,
#p-tb       { border-top: 1px dotted #888888; }
==>

3. Links

Using the Forge Styles http://forge.mysql.com/css/shared.css we can adjust the links accordingly.

a:link      { color: #10688E; text-decoration: none; }
a:hover     { color: #003366; text-decoration: underline; }
a:visited   { color: #106887; text-decoration: none; }

a.new:link  { color: #AA0000; text-decoration: none; }
a.new:hover { color: #AA0000; text-decoration: underline; }

==>

4. Content

Ok, we have played around a little, now to move into some more serious changes. Looking at the general page look and feel that you see in most page content.


html,body,p,td,a,li
            { font: 12px/19px Verdana, "Lucida Grande", "Lucida Sans Unicode", Tahoma, Arial, sans-serif; }

h1          { font: bold 24px Helvetica, Arial, sans-serif; color: #EB694A; letter-spacing: -1px;
              margin: 1.5em 0 0.25em 0;
              border-bottom: 1px dotted #888888; line-height: 1.1em; padding-bottom: 0.2em; }

h2          { font: bold 18px Helvetica, Arial, sans-serif;  color: #EB694A; letter-spacing: -1px;
              margin: 2em 0 0 0;
              border-bottom: 1px dotted  #888888; line-height: 1.1em; padding-bottom: 0.2em; }

h3          { font-size: 12px; color: #6F90B5; }

h4          { font-size: 12px; color: #6F90B5; }

At this time, I’ve created an Example Page to better demonstration of the look and feel, as the default MediaWiki Main page has limited content.

Some Text Content at the start of the page.

= A First Level Heading =
The first paragraph text.
[[Main Page | Internal Link to existing Page]],
[[Nonexistent Page | Internal Link to nonexistent Page]],
[http://forge.mysql.com  External Link],
http://www.mysql.com  (plain text link)

== A Second Level Heading ==
The first paragraph text.

== A Second Level Heading ==
The first paragraph text.
* List Item Line 1
* List Item Line 2
* List Item Line 3
=== A Third Level Heading ===
The first paragraph text.
* List Item Line 1
** Sub Item 1
** Sub Item 1
** Sub Item 1
*** Sub Item 1
*** Sub Item 2
*** Sub Item 3
* List Item Line 2
* List Item Line 3
=== A Third Level Heading ===
The first paragraph text.
# List Item Line 1
## Sub Item 1
## Sub Item 1
## Sub Item 1
### Sub Item 1
### Sub Item 2
### Sub Item 3
# List Item Line 2
# List Item Line 3
== A Second Level Heading ==
The first paragraph text.

== A Second Level Heading ==
The first paragraph text.

==>

5. Table of Contents

By default, the Table of Contents shows at the top of a page when a given amount of sections or content is present (not sure what the trigger is). The issue is, for larger pages, it means you need to scroll down before you can see any page content. You can disable this with the __NOTOC__ option, but a better solution is to position the Table of Contents so as to not interfere with initial content.

#toc        { float: right; margin: 0 0 1em 1em; border: solid 1px #888888; text-color: #EFEFEF; color: #333333; }
#toc td     { padding: 0.5em; }
#toc .tocindent
            { margin-left: 1em; }
#toc .tocline
            { margin-bottom: 0px; }
#toc p      { margin: 0; }
#toc .editsection
            { margin-top: 0.7em;}
==>

Ok, we are about half way there.

6. Menu Options

In order to get a look and feel like the Forge Home Page., we now have to work on the rest of the navigation options at the top of the page above the content. Let’s start with Second Line of Menu Options (I’ll explain more later why).

#p-cactions { padding-right: 0px; margin-right: 0px; background-color: #006486; width: 100%; top: 77px; }
#p-cactions ul
            { margin: 0; padding: 0; list-style: none;  font-size: 85%; margin-left: 10px; }
#p-cactions li
            { float:left; margin:0; padding:0; text-indent:0; border-width: 0px; }
#p-cactions li a
            { display:block;  color:#F7F7F7;  font-weight: bold;
              background-color: #666666;  border:solid 1px #DDDDDD;
              border-width: 0px;  border-left-width:1px;  text-decoration:none; white-space:nowrap;}
#p-cactions li a:hover
            { background-color: #FFBC2F; color: #666666; }
#p-cactions li.selected a
            { background: #EA7F07;  border:none;}
#p-cactions li.selected  a:hover
            { color: #000000; }
==>

7. Top Menu Options

#p-personal .pBody
            { background: #FFFFFF url(http://forge.mysql.com/img/bggradient.jpg) no-repeat top right; }

#p-personal li a,
#p-personal li a.new
            { color: #FFFFFF; text-decoration: none; font-weight: bold; }
#p-personal li a:hover
            { color: #E97B00; background-color: transparent; text-decoration: underline; }
==>

The down side is it should be the same height at the Forge Page. This required a little more work, and other sections had to be adjusted accordingly, hence why I left this to last. (The size is based on the later mention logo height + margins)

#p-personal { height: 62px; }
#p-personal .pBody
            { height: 62px; }
#p-cactions { top: 62px; }
#content    { margin-top: 84px; }

==>

8. Some Miscellaneous Things

  • Deemphasis the text in the footer a little
  • Remove the annoying arrow that occurs after external links
  • Remove the logo from the login link
  • Hide the My Talk link
#footer     { color: #888888; }
#bodyContent a[href ^="http://"]
            { background:inherit !important; padding-right:inherit !important}
li#pt-userpage
            { background: none; }
li#pt-mytalk
            { display: none; }

==>

The Badging

Let us not forget the final step, the logo badging.

This requires a change to a MediaWiki filesystem file

LocalSettings.php.

$wgLogo             = "http://forge.mysql.com/img/mysqllogo.gif";

And the following Style changes.

#p-logo, #p-logo a, #p-logo a:hover
            { width:100px; height: 52px; }
#p-logo     { margin-left: 10px; margin-top: 5px; margin-bottom: 5px; }

#p-cactions { left: 0px; }
#p-cactions ul
            { margin-left: 180px; }

In order to overcome the top options bleeding to white text on white background, I’ve increased the right side fill of the default bggradient image, replacing the appropriate ULR with the following.

#p-personal .pBody
            { background: #FFFFFF url(/images/bggradient.png) no-repeat top right; }
==>

Conclusion

It’s not quite perfect yet, but this shows how it can be done. Some minor things are left, but I’ve run out of time for the few hours I allocated to this.
The end result of monobook.css for this lesson is:

/* edit this file to customize the monobook skin for the entire site */

/* Background Display */

body        { background-image: none; background-color: #FFFFFF;}
.portlet    { background-color: #FFFFFF; }

/* Borders */

#content    { border-width: 0px; }
.portlet .pBody
            { border-width: 0px; }
#footer     { border-top: 1px solid #888888; border-bottom-width: 0px; }

#p-navigation,
#p-search,
#p-tb       { border-top: 1px dotted #888888; }

/* Links */
a:link      { color: #10688E; text-decoration: none; }
a:hover     { color: #003366; text-decoration: underline; }
a:visited   { color: #106887; text-decoration: none; }

a.new:link  { color: #AA0000; text-decoration: none; }
a.new:hover { color: #AA0000; text-decoration: underline; }

a.external:link {color: #000000; text-decoration: none; }
a.external:hover { color: #AA0000; text-decoration: underline; }


/* Page Look & Feel */
html,body,p,td,a,li
            { font: 12px/19px Verdana, "Lucida Grande", "Lucida Sans Unicode", Tahoma, Arial, sans-serif; }

h1          { font: bold 24px Helvetica, Arial, sans-serif; color: #EB694A; letter-spacing: -1px;
              margin: 1.5em 0 0.25em 0;
              border-bottom: 1px dotted #888888; line-height: 1.1em; padding-bottom: 0.2em; }

h2          { font: bold 18px Helvetica, Arial, sans-serif;  color: #EB694A; letter-spacing: -1px;
              margin: 2em 0 0 0;
              border-bottom: 1px dotted  #888888; line-height: 1.1em; padding-bottom: 0.2em; }

h3          { font-size: 12px; color: #6F90B5; }

h4          { font-size: 12px; color: #6F90B5; }

/* Table of Contents */
#toc        { float: right; margin: 0 0 1em 1em; border: solid 1px #888888; #EFEFEF; color: #333333; }
#toc td     { padding: 0.5em; }
#toc .tocindent
            { margin-left: 1em; }
#toc .tocline
            { margin-bottom: 0px; }
#toc p      { margin: 0; }
#toc .editsection
            { margin-top: 0.7em;}


/* Second Line Top Menu Options */

#p-cactions { padding-right: 0px; margin-right: 0px; background-color: #006486; width: 100%; }

#p-cactions ul
            { margin: 0; padding: 0; list-style: none;  font-size: 85%; margin-left: 10px; }

#p-cactions li
            { float:left; margin:0; padding:0; text-indent:0; border-width: 0px; }

#p-cactions li a
            { display:block;  color:#F7F7F7;  font-weight: bold;
              background-color: #666666;  border:solid 1px #DDDDDD;
              border-width: 0px;  border-left-width:1px;  text-decoration:none; white-space:nowrap;}

#p-cactions li a:hover
            { background-color: #FFBC2F; color: #66666; }

#p-cactions li.selected a
            { background: #EA7F07;  border:none;}

#p-cactions li.selected  a:hover
            { color: #000000; }

/* Top Menu Options */
#p-personal .pBody
            { background: #FFFFFF url(/images/bggradient.png) no-repeat top right; }

#p-personal li a,
#p-personal li a.new
            { color: #FFFFFF; text-decoration: none; font-weight: bold; }
#p-personal li a:hover
            { color: #E97B00; background-color: transparent; text-decoration: underline; }

/* Top Menu Height Adjustments */
#p-personal { height: 62px; }
#p-personal .pBody
            { height: 62px; }
#p-cactions { top: 62px; }
#content    { margin-top: 84px; }

/* Minor Things */
#footer     { color: #888888; }
#bodyContent a[href ^="http://"]
            { background:inherit !important; padding-right:inherit !important}
li#pt-userpage
            { background: none; }
li#pt-mytalk
            { display: none; }

/* Badging */
#p-logo, #p-logo a, #p-logo a:hover
            { width:100px; height: 52px; }
#p-logo     { margin-left: 10px; margin-top: 5px; margin-bottom: 5px; }

#p-cactions { left: 0px; }
#p-cactions ul
            { margin-left: 180px; }

References

User Styles Alternative
Gallery of user styles
Skins
LocalSettings.php Style Information
Navigation Bar
User Rights
Wikipedia Monobook
MediaWiki Skin HowTo

UltimateLAMP

This product is no longer available. This was created over 6 years ago and software is too out of date.


As I discussed earlier in A picture can tell a thousand words, I outlined briefly what the intention of UltimateLAMP was for. Let me spill the beans so to speak.

What is UltimateLAMP?

UltimateLAMP is a fully functional environment allowing you to easily try and evaluate a number of LAMP stack software products without requiring any specific setup or configuration of these products. UltimateLAMP runs as a Virtual Machine with VMware Player (FREE). This demonstration package also enables the recording of all user entered information for later reference, indeed you will find a wealth of information already available within a number of the Product Recommendations starting with the supplied Documentation.

My executive punch line with the “right” buzz words is:


You have heard of all the hype in Open Source with lowering Total Cost of
Ownership (TCO) or Total Economic Impact (TEI)? Evaluate Open Source now
in your organistion at no cost or risk with this software package.

What are the uses for UltimateLAMP?







Well, in a nutshell UltimateLAMP allows for instant exposure of Open Source products that use MySQL. You have to remember my goal here is to promote MySQL. Unfortunately, as with any database the features alone will not get the new sale, it’s the applicability to an organisation, and with Open Source the wealth of existing and developing applications that exist can. I don’t necessarily like this approach. Indeed many open source products are poorly designed and can be poor choices in a large scale enterprise solution, but the flip side is, where else can you start.

The benefits of selling a MediaWiki for example due to the success and scalability with Wikipedia is great. So it’s important that the product list includes proven products and currently developing products (rather then stale ones). This is something that the community can definitely provide valuable feedback on to help in this selection.

Other then becoming a CD used as a drink coaster, I feel the potential is here already to provide a copy to people, even install it on a managers computer. You can’t break the software, so why not install it for your non-computer user friends/family. The goal is to move up to executive management however I feel the exposure to the general community first will greatly help.

How it came about

There is some history to this idea. Here are some of the highlights.

  • Late 2005, several planning sessions with Jon and Morgan about a more practical Open Source Contribution user group lead to obviously LAMP stack products for simplicity and exposure. This lead to exposure of LAMP stack products to more of the general person and split from the original intended goal, but was a great idea.
  • Early 2006, futher discussions of how MySQL could get exposure and traction into organisations. For myself professional, how could I promote in industry sectors that I work in.
  • In April 2006, the MySQL Users Conference with discussions of this idea with others and the positive feedback
  • In May 2006, the VMware Appliance Challenge was the possible exposure and deadline needed for me to “Just Do It”. Originally the idea was intended as a Live CD, but in some ways a virtual machine is just as good.

Where to from now!

Well, supply and demand. I don’t know if anybody else has a particular use, or will even download to use it, or market it.

I could see the potential for MySQL User Groups to get behind my idea, and enable members to filter this into known organisations. I could see for targeted opportunities/events, CD’s or information could be distributed. With the support and backing of MySQL AB, I could see the opportunity for even a breakfast CEO/CTI/CIO introduction or other format of meeting the ultimate intended audience.

I could see the potential that an organisation or entity could provide free hosting (30 days) to an organisation that pointed a 3rd level domain to the provider (e.g. wiki.acme.com). It’s like the honeypot, if it takes off after 30 days, the company will either want to pay for hosting, or what to move it. And that’s also fine, organisation provides a MySQL dump, and links to documented installation instructions, or perhaps a sale of services for initial installation/customisation/training can be made.

What can I do?

There is always a list of things that can be done. For now the greatest thing I can ask for is feedback. The good, the bad and the ugly. It’s find to get the comments to say, “That’s great”, or “Good job” or “I can use this”, it’s just as important to get the comments that are proactive in what’s not good. I would value any feedback. Please feel free to Download UltimateLAMP

On my immediate ToDo List or even partially complete is:

  • Document VMPlayer installation instructions for Windows/Linux (partial).
  • Add more product sample content.
  • Add more mediawiki content about the product, like customisation options, references to specific documentation, or other online working examples.
  • Documenting the installation/creation instructions for individual products.
  • Figure out a better way for users to contribute content that get’s rolled back into the Virtual Machines. For the mediawiki, I could see a public online copy, but for other products it could become harder
  • Optimise VM image (removing unnecessary OS stuff), removing product language support (not ideal), but my goal is to provide a 2 CD pack. The first CD has VMplayer in Win/Linux/RPM formats and the default VMware BrowserAppliance (All software from VMWare). The second CD is UltimateLAMP. Combined in a DVD 2x case along with perhaps a small booklet of a few pages, would enable this product to potential move to a commercial state.
  • Investigating other products

Should anybody wishing to help, leave a comment, that way I can see somebody is reading this and so can others.

Related Post:

UltimateLAMP Passwords

A picture can tell a thousand words

I’m a keen advocate of MySQL. However, while I use it and promote it within my limited IT circles, I often wonder how MySQL can get better traction and exposure, especially within both the industry sectors and physical locations where I am presently.

This presents a dilemma, it’s almost like the term that has been used in Venture Capital, and in the well named book, Crossing the Chasm. I see and believe that MySQL already has good penetration within certain industry sectors, specifically Internet Based, Startup Based, or Small Based segments. However, I’m sure within other commercial sectors, MySQL has either a token exposure or little to no exposure at all, at least in the circles I mix with.

So how can MySQL the product and MySQL AB the company get both better exposure and penetration? Ultimately it’s great for the community, both in dollars driving product features, product support and from my interest, more jobs. I figure there are many different approaches, and they all target organisations, and the individuals making decisions within the organisations differently. You have the 24×7 support with MySQL Network and certified installations that can satisfy management, you have the speed, flexibility, performance and capability that can appeal to a DBA. But this I doubt is the ultimate hurdle.

How do you do the active sell in 60 seconds to a potential cold contact, or how do you as an MySQL Open Source advocate within an organisation do the active or passive sell to a CEO/CTO/CIO/IT Manager etc.

Well I have an idea, it’s called UltimateLAMP. It’s not the ideal solution, it’s not the great sell, but I figure it’s one approach that I can contribute to and promote.

Stay tuned, more to come very soon.

MySQL GUI Products

I’ve started now to actively use more of the MySQL Workbench, MySQL Administrator and MySQL Query Browser and MySQL Migration Toolkit. I am traditionally a very command line person, and it’s important that these skills are never put on the back burner. For all those budding Developers and DBA’s you need these skills, expecially for any MySQL Certification.

To indicate my indent, I’ve even created a new blog category specifically for MySQL GUI Products.
As I’ve mentioned previously, Mike Zinner from MySQL AB really impressed me with where the products were going in development. I’m still rather disappointed about the stability and compatibility when attempting to run under Linux (which is my desktop), but I’ve put that aside and configured a suitable environment to best use and test these products under Windoze! I’ll also be able to trial products under both environments and provide valuable feedback to Workbench Bugs.

To overcome this, as I use Windows reluctantly in a VMware environment, I’ve decided to install all MySQL GUI Products on an older laptop (a Dell Inspiron 5000, PIII 600MHz), which I use for media playing (DivX .avi’s) on my TV. Yes, I hate to resort back to windows, but I had difficulties getting the TV Out working properly under Linux, and I only have so much free time. I would have liked to nut it out, but I find Windows superior in only 2 things, driver compatibility and running Photoshop, so I’ll leave it at that.

So being about to VNC to this machine gives me a spread of processing usage which is acceptable for now, and I’m working from a single desktop. (Time to dream some more about that Dell 24″ UltarSharp Widescreen LCD Flat Panel I’d like).

Well again, with just starting MySQL Workbench, and for the first ever time launching MySQL Administrator I came across Windows functionality I’d not seen before. Within MySQL Administrator under the Windows menu option I was able to switch to MySQL Workbench, a completely different running Windows Program. Now, I’d not seen this before in any multiple running products in Windows. Obviously the power of the GRE enables this, what a nice feature. Well back to now looking at this product some more.

WordPress Blog Upgrade Time

Time to Upgrade my WordPress Blog software from Version 1.5.2 to Version 2.0.2 after my latest spam attacks and Combating Blog Spam attempt.

Here is what I did.

cd /home/arabx/www
tar cvfz blog.backup.20060520.tar.gz wordpress-1.5.2
mysqldump -uroot -p arabx_blog > blog.db.20060520.sql
mv blog.*20060520* /u01/backup   # Not in my WWW
scp blog.*20060520* to@someplacesave:/backup
wget http://wordpress.org/latest.tar.gz
mv latest.tar.gz    wordpress-2.0.2.tar.gz # I really hate unversioned files

Disable Plugins. You had to do this manually from the Admin interface (I’m sure it’s just a SQL statement).
Goto http://blog.arabx.com.au/wp-admin/plugins.php

Now some more work.

tar xvfz wordpress-2.0.2.tar.gz  # creates a wordpress directory
mv wordpress wordpress-2.0.2
ln -s wordpress-2.0.2 blog
cp wordpress-1.5.2/wp-config.php wordpress-2.0.2

Back to the Web page again. So much for unattended upgrades?

http://blog.arabx.com.au/wp-admin/upgrade.php

Follow the Steps (step actually). That all worked, now to restore all the additions?

cp -r wordpress-1.5.2/wp-content/themes/arabx wordpress-2.0.2/wp-content/themes/
cp -r wordpress-1.5.2/wp-content/plugins/eventcalendar3* wordpress-2.0.2/wp-content/plugins/

Reactivate my plugins at http://blog.arabx.com.au/wp-admin/plugins.php
Change to my Theme at http://blog.arabx.com.au/wp-admin/themes.php

So my blog page http://blog.arabx.com.au/ looks much like it did previously. Time now to review some of the new features and plugins, expecially steps to combat spam.
References
WordPress Official Upgrade Page
Upgrade to WordPress 2.0
What’s New in WordPress 2.0?

MySQL :: Developer Zone Quick Polls

I don’t get to the MySQL Developer Zone main page often enough. In thinking about what pages I view everyday or regularly, it doesn’t rate as high as Planet MySQL, MySQL Forums or even the MySQL Forge.

I was most dissappointed in the results of a recent poll What did you think of the 2006 Users Conference?. The top response was I had no idea there was a Users Conference. That’s not good to see this.

An interesting poll What are you most looking forward to at the MySQL Users Conference (April 24-27)?, the clear winner was Drinking beer with MySQL gurus. What does this say about the attendees. Either they are all alcho’s or the just want to be around guru’s in a less technical way.

I see this page also has a live feed of Planet MySQL. Perhaps we should get some more stuff down the right side of PlanetMySQL like the current Quick Poll itself and a feed of the current developer articles at the Developer Zone.

A Post MySQL Conference review. The 4 F's

Finally back home after some R&R at Yosemite before leaving the US. In conclusion, to sum up my experience of the 4th Annual MySQL Users Conference “Excellent”.
Here’s my take. Friends, Functionality & New Features, the Future.

Friends

I’ve used MySQL now for over 6 years, and full time for a number of years, yet I’ve only become active in the MySQL community, particularly Planet MySQL in the past 6 months. Over that time, I’ve read a lot from members, and heard from many people. It was great at the conference to meet many of these people for the first time. The list includes: Community MembersFrank Mash,Mike Kruckenberg, Markus Popp, Roland Bouman, Giuseppe Maxia and Paul McCullagh. MySQL EmployeesMike Hillyer, Colin Charles, Jay Pipes, Mike Zinner, and New ContactsKristian Köhntopp, Jeremy Cole, Sheeri Kritzer, Taneli Otala, Laura Thompson just to start the list.

Functionality

Not only was there plenty of discussion on Server Functionality, there was plenty of MySQL Client functionality including the MySQL Workbench, MySQL Migration Toolkit and the other MySQL GUI products.
There were a number of discussions on uses and implementations of MySQL in large web deployments. It would be great to see some more white papers here.

New Features

A few months ago I wrote an article A call to arms!. In some part, I was just giving my opinion and hoping to gee up some support and feedback from the community. Well, the MySQL 5 Pluggable Storage Architecture got a great boost with announcements of transactional storage engines Falcon by Jim Starkey, Solid and PBXT. Add details of InnoDB New Features, MyISAM additions, and indications of other wonderful if not entirely practical options. I’m sure there is much more in stall to come this year that wasn’t discussed.

A number of talks featured Cluster including Monday’s tutorial, and with 5.1 and beyond I can see next year there will be more discussion on successful Cluster implementations. There was a lot of talks about Scaling out. I’d like to see more practical examples, perhaps a detailed tutorial.

The Future

What does the future hold for MySQL? The MySQL Server and Storage Engine Roadmap provided an insight of the upcoming planned features and releases over the next 2 years. Of course, the marketplace can change quickly, and MySQL is in a great position to react to the needs of the community quickly.

And before your know it, the 5th Annual MySQL Conference will be in play.

Conference Feedback

One thing I had a chance to discuss with Jay Pipes after the conference, I wasn’t the first to mention, and plans are already in motion, was a number of talks just needed more time. Moving the schedule to 55 minute talks gives that extra time, even if it is open question time from the floor, but it also makes knowing when sessions are on much easier, if they always start at the top of the hour.

In Conclusion

Frank (a.k.a Farhan Mashraqi) asked me what session I liked the most? Hard to say. Agile Database Techniques: Data Doesn’t Have to be a Four-Letter Word Anymore rated very highly, as the content was close to heart and my expertise. HackFest B: Creating New SHOW Commands by Brian Aker, showed just how easy it was to get into the MySQL source. Of course the internals are much more complex then this, but it was a good introduction. My favourite keynote was The Ubuntu Project: Improving Collaboration in the Free Software World. There were a number of talks I was disappointed in, as well as a number I didn’t get to due to 8 sessions in parallel.

I would have to say, that what impressed me most was no one single talk, but the functionality of the GRT Shell that Mike Zinner and his team have built into the GUI product line. I was very impressed, and I could see this providing extensive functionality and not just MySQL specific centric tools. This will be area I’ll be focussing on my contributions in the near future.