Eclipse CVS Tutorial 1

Repository Management


  • Start Eclipse
  • Window | Open Perspective | CVS Repository Exploring
    • This will add an icon to the quick dock option. See Point 1
    • This should open the CVS Repositories View. See Point 2. If not, you can do Window | Show View | CVS Repositories
  • In the CVS Repositories View, Right Click | New | Repository Location
  • Enter the following details on the Add CVS Repository window
    • Host: cvs.arabx.com.au
    • Repository Path: /home/repository/cvs
    • User: [username]
    • Password: [password]
    • Connection Type: pserver
    • Finish




Project Management

  • Within the CVS Repositories View, you can see the appropiate CVS repositories
  • Expand the repository, and then HEAD (Point 1) to get a list of projects within the CVS Repository (Point 2).
  • The simpliest way to get a CVS project into Eclipse, is to Right Click on the CVS Project, and select Checkout
  • In larger projects, you would always do Right Click, Checkout Out As, this gives you a dialog.
    • At the Check Out At dialog, normally you would select the appropiate project name (Point 1), normally the project plus some indicator of tag version or branch. Click Next (Point 2)
    • Next
    • Now is when you normally select an appropiate branch or version. In this case, you could expand Branches or Versions (Point 1). I’ve simply created some dummy versions (BUILD_100, BUILD_101) (Point 2). Choice One, Click Finish (Point 3)
  • Now you should be in the Navigator View, if not, Select Window | Show View | Navigator.
    • Expanding out the project, will notice the following.
    • The Project has a decorator of the Repository (Point 1), and generally the branch details if applicable
    • Individual files all have a Revision Decorator (Point 2)
    • New Files when created have a > indicator (Point 3), and this propogrates to each parent directory and project

NOTE: These CVS repositories are just test areas, they are not used for any production systems.

What is a BLOB?

If your answer was Binary Large Object, then you would be wrong. Check out Mike’s comments on History of Database Blobs from the man that created the BLOB, Mr Jim Starkey.

I had the chance to meet Jim a few months ago, and I spent quite some time talking on various database topics. He is one cool guy.

Mercurial Version Control Software

I got asked (being a Java developer) about what was involved in creating an Eclipse Plugin for Mercurial. Well in true Google style, why invent when somebody probably already has. A quick check finds Mercurial Eclipse by VecTrace.

Now until last week, I’d never heard of Mercurial, so this is really an introduction to somebody that has no idea.

What is Mercurial?

Mercurial is a fast, lightweight Source Control Management system designed for efficient handling of very large distributed projects.

Ok, so big deal, I use CVS. I also use Subversion (SVN) for my Apache contributions, and also for MySQL GUI products. Why do we need another Version Control Product? Mercurial is a Distributed Software Configuration Management Tool. The following is from the Mercurial Wiki

A distributed SCM tool is designed to support a model in which each Repository is loosely coupled to many others. Each Repository contains a complete set of metadata describing one or more projects. These repositories may be located almost anywhere. Individual developers only need access to their own repositories, not to a central one, in order to Commit changes.

Distributed SCMs provide mechanisms for propagating changes between repositories.

Distributed SCMs are in contrast to CentralisedSCMs.

So, clearly a distributed model would well in a large distributed organisation where external factors limit continous access to a single central repository. Low Bandwidth, Poor Internet Connectivity, being on a plane, and travelling are all things that would make a distributed model a more ideal solution. I know I’ve taken my laptop away, and being an “Agile Methodology” developer, I commit often. When you have several days of uncommitted work it goes against the normal operation.

You can get more information at the official website at http://www.selenic.com/mercurial. A few quick links are: Quick Start Guide, Tutorial, Glossary.

Installing Mercurial

su -
cd /src
wget http://www.selenic.com/mercurial/release/mercurial-0.9.tar.gz
tar xvfz mercurial-0.9.tar.gz
cd mercurial-0.9
# NOTE: Requires python 2.3 or better.
python -V
python setup.py install --force

A quick check of the syntax.

$ hg
Mercurial Distributed SCM

basic commands (use "hg help" for the full list or option "-v" for details):

 add        add the specified files on the next commit
 annotate   show changeset information per file line
 clone      make a copy of an existing repository
 commit     commit the specified files or all outstanding changes
 diff       diff repository (or selected files)
 export     dump the header and diffs for one or more changesets
 init       create a new repository in the given directory
 log        show revision history of entire repository or files
 parents    show the parents of the working dir or revision
 pull       pull changes from the specified source
 push       push changes to the specified destination
 remove     remove the specified files on the next commit
 revert     revert files or dirs to their states as of some revision
 serve      export the repository via HTTP
 status     show changed files in the working directory
 update     update or merge working directory

A more detailed list:

$ hg help
Mercurial Distributed SCM

list of commands (use "hg help -v" to show aliases and global options):

 add        add the specified files on the next commit
 annotate   show changeset information per file line
 archive    create unversioned archive of a repository revision
 backout    reverse effect of earlier changeset
 bundle     create a changegroup file
 cat        output the latest or given revisions of files
 clone      make a copy of an existing repository
 commit     commit the specified files or all outstanding changes
 copy       mark files as copied for the next commit
 diff       diff repository (or selected files)
 export     dump the header and diffs for one or more changesets
 grep       search for a pattern in specified files and revisions
 heads      show current repository heads
 help       show help for a given command or all commands
 identify   print information about the working copy
 import     import an ordered set of patches
 incoming   show new changesets found in source
 init       create a new repository in the given directory
 locate     locate files matching specific patterns
 log        show revision history of entire repository or files
 manifest   output the latest or given revision of the project manifest
 merge      Merge working directory with another revision
 outgoing   show changesets not found in destination
 parents    show the parents of the working dir or revision
 paths      show definition of symbolic path names
 pull       pull changes from the specified source
 push       push changes to the specified destination
 recover    roll back an interrupted transaction
 remove     remove the specified files on the next commit
 rename     rename files; equivalent of copy + remove
 revert     revert files or dirs to their states as of some revision
 rollback   roll back the last transaction in this repository
 root       print the root (top) of the current working dir
 serve      export the repository via HTTP
 status     show changed files in the working directory
 tag        add a tag for the current tip or a given revision
 tags       list repository tags
 tip        show the tip revision
 unbundle   apply a changegroup file
 update     update or merge working directory
 verify     verify the integrity of the repository
 version    output version and copyright information

Mercurial Eclipse Plugin

The plugin is still in it’s early days, but the “FREEDOM” of open source enables me to easily review. After a quick install and review of docs, I shot off an email to the developer, stating why I was looking, and while I have other projects on the go, I asked what I could do to help. It’s only be 2 days and we have already communicated via email several times on various topics. That’s one reason why I really love the Open Source Community. Generally people are very receptive to feedback, comments and especially help.

Within Eclipse

  • Help ->Software Updates-> Find and install…
  • Select “Search for new features to install”, click Next
  • Click “New Remote site…”
  • Enter following details and click Ok
    • Name: MercurialEclipse Beta site
    • URL: http://zingo.homeip.net:8000/eclipse-betaupdate/
  • Follow the prompts to accept the license and download.

So now with Eclipse, on a project you can simply go Right Click -> Team -> Share Project -> Select Mercurial

A Quick Mercurial Tutorial

Of course the quickest way to learn about using Mercurial is to look at an existing product. So taking this plugin project for a spin.

$ cd /tmp
$ hg clone http://zingo.homeip.net:8000/hg/mercurialeclipse com.vectrace.MercurialEclipse
$ hg clone com.vectrace.MercurialEclipse example
$ cd example
# Create some new dummy files
$ touch test.txt
$ touch html/test.html
# View files against respository status
$ hg status
? html/test.html
? test.txt
# Add the new files
$ hg add
adding html/test.html
adding test.txt
# Commit changes
$ hg commit -m "Testing Mercurial"

So other then the second clone command (which enabled me to not mess up the original repository and to test distributed handling next), this is just the same as CVS (checkout, diff, add, commit)

# The Distributed nature involves first Pulling from the "upstream" respository
$ hg pull ../com.vectrace.MercurialEclipse
pulling from ../com.vectrace.MercurialEclipse
searching for changes
no changes found
# Confirm our new file is not in "upstream" respository
$ ls ../com.vectrace.MercurialEclipse/test.txt
ls: ../com.vectrace.MercurialEclipse/test.txt: No such file or directory
# Push local respository changes to "upstream" respository
$ hg push ../com.vectrace.MercurialEclipse
pushing to ../com.vectrace.MercurialEclipse
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 2 changes to 2 files
$ ls ../com.vectrace.MercurialEclipse/test.txt
ls: ../com.vectrace.MercurialEclipse/test.txt: No such file or directory

Hmmm, missed something here. The Quick Start Guide docs seems to want to work back into the original respository, pulling in changes from the “downstream” repository, then executing a merge and commit command. I don’t want to do this just in case I start messing up the original repository. Time for me to create a new standalone repository so I don’t screw anything up. Stay tuned for an updated tutorial.

Of course this is very basic, I need to look into commands used in CVS like login, tagging (versioning), branching, diffing revisions and merging but this is a start.

I do have a concern with the number of distributed respositories, when and how do you know you are in sync with the “master”, indeed what is the “master”. It does merit some investigation to see what management is in place, like identifying all respositories, and comparing for example.

Conclusion

Ok, so now I’ve got a grasp on Mercurial, time to review the Java Code and see what works and what doesn’t in the Eclipse environment. Of course I don’t use Mercurial so what I may consider as functionality required, may be lower priority to those users out there. Any feedback I’m sure will be most appreciated by the original developer.

What is software quality?

Greg Lehey wrote today Is MySQL getting buggier?. The underlying question of his comments is a more fundamental and passionate topic, and especially for me. That is “Software Quality”.

The quintessential question is this. “How do you determine the ‘software quality’ of a product?” And then quickly followed by, “How do you benchmark this with other software products?”

The short answer to second question is simple. You can’t. The reasons why become apparent in addressing the first question. (There’s a mathematical term for this two question situation, another one of the million things to research and remember one day).

15 years ago as part of my masters research I worked on “Improving Software Quality and Software Productivity”. At the time when I started, I found that these were generally considered mutually exclusive. Quite simply, to improve software quality was to decrease productivity, and when you had to improve productivity, quality declined. Unless you had a clear end goal (which you can’t in software, features, cost, satisfaction etc.), it was impossible to measure the total “cost”. It was also impossible to clearly determine impact on key factors, as with software you never have a “control group” working in isolation of other groups trying different things, and being able to do imperial comparison analysis.

In summary, I found that to improve software quality and software productivity you had to reconsider the approach at several different levels.

  1. You had to strive to work towards a simpler business solution, rather then a more complex solution.
  2. You had to leverage converting business knowledge that was held in lengthy and often dated paper documents, into electronically managable content. Some of the benefits included concurrency, consistency and comparative analysis of the knowledge. The early days of CASE repositories were built on this principle, again before they became to complex.
  3. You also had to look at better ways of writing code, and the area I focussed on was code generation. That is “Code writing Code”. (I have a very successful story of 400,000-500,000 Lines of C code. The ODT was a specialised intelligent peer-to-peer selective replication, between a central repository and 32 distributed sites, allowing for independent heterogeneous operations. It provided levels of complexity within replication including two-way master of partial database data, selective row replication between nodes depending on business needs, and collision management. And I did this is1992. A story for another day).

I also found in the business and industry sectors of research, it was near impossible to get a commonality of some components, a higher level of re-use, a higher value of cost savings. There were many factors for this, another topic for another time.

Now that was a long time ago, and a number of things have changed greatly, including my views. These are far more prevalent regarding Open Source.

Some historical general principles in measurement don’t apply today in Open Source. I remember reading recently in the browser wars, that FireFox was “too honest”. Being more open to acknowledge problems/bugs and correcting them, then the Microsoft perspective. How do you compare quality in this situation. (Side Note: I wish there was a search engine that would search all pages that you had viewed, bookmarked etc., rather then searching the WWW for everything. Would make life much easier in finding references like this.)

Open Source does something commercial software can’t do. If you find a bug, and you can’t wait, then fix it yourself, or at worse pay somebody to fix it for you. The scope, functionality and rules are constantly changing, the resourcing and priorities as well. How do you adequately measure something so fluid.

I am a proponent of “Agile Development Methodologies”, having use for many years eXtreme Programming (XP) either rather seriously, or taking small attributes and applying to existing infrastructures. Terms include Test Driven Development (TDD), Feature Driven Development (FDD), Code Coverage, Unit Tests.

Speaking just on Unit Tests. You write a Unit Test before you write your code. It helps to understand what you are trying to achieve before you try to achieve it. It also provides coverage for when the rules change, you write new or changed tests, which allows you to have analysis performed by your code to address problems. I’ve found that writing extensive tests become a self documentation system of the functionality, it highlights what can and can’t occur. Simply ‘grep’ing can offer a clear summarised description of code, even if you have now knowledge of it. In Java you write long method names (some abstract examples would be ‘testFunctionDoesThisAndNotThis”, “testFunctionAcceptsTwoOfThese”, “testFunctionFailsWithValuesOfThis” etc.) I need to provide a more concrete example to better explain.

An Agile approach, also tends to better encapsulate the Web 2.0 principle of considering software as a service, and not a product. This entirely changes the general thought of software development, of the large cost of software development, followed by a maintenance period and cost of the product, followed by a review of potential new products as circumstances change, followed by … (you get the picture)

If you consider an acceptable ongoing cost for providing a service, perhaps pegged against other business revenue/expenses, and then your product was managed by what available resources you had (time, money, people), rather then by functionality requirements you entire perspective changes. You include a high level of user ownership, the release often principle, and the capacity to review and adapt to changing and improving technology and requirements, all these factors change the outcome on how software is development, managed and perceived.

The other thing about all this is, It’s common sense. It’s a simpler approach, and if everybody considered “How can I make this simpler?” first, it would change they way software is developed and perceived.

So getting back onto the topic of Software Quality. I found that in the right circumstances (and there are wrong circumstances), imploring a change in principle towards Agile concepts can have a significant effect on Software Quality. But, Beauty is in the eyes of the beholder. Software Quality is also much the same. What may be acceptable for somebody is unacceptable for somebody else. How then do you measure this?

So after all this, we are left with the question. “How do you determine the ‘software quality’ of a product?” Well, we can only consider a single product in isolation, and we are talking the MySQL server. I should note that you can’t consider all MySQL products in one analysis, different factors including the relative age of the product, the technologies etc. all affect the measurement. You need to consider each product individually.

So why is the number of bugs, or number of open bugs a measure of software quality. What about how big is the product (e.g. lines of code), how complex is the product, how mature is the product, how tested is the product. What we considered the lifespan of critical bugs as a measurement? So what if there were a larger number of critical bugs, but they were fixed immediately is that a good or poor reflection on software quality. If software is in the Web 2.0 “Perpetual Beta”, then are bugs really incomplete functionality.

In software and for organisations, I think an approach could be similar as mentioned by a prospective employer recently. “We know things aren’t the best they could be, but they are better then they were. If our goal could be to improve just 1% each day, then we would be easily 100% better in 6 months.” So with whatever metrics were in place (and you have to have some form of measurement), if there is always a continual improvement, then software quality is always improving.

Generating an internal SSL Certificate (for tomcat)

How to Generate an internal SSL certificate

Create the self-signed keystore

$ su -
$ URL="your.url.here";export URL
$ cd /opt/tomcat/conf
$ keytool -genkey -alias ${URL} -keyalg RSA -keystore ${URL}.keystore
Enter keystore password:  changeit
What is your first and last name?
  [Unknown]:  your.url.here
What is the name of your organizational unit?
  [Unknown]:  IT
What is the name of your organization?
  [Unknown]:  your.url.here
What is the name of your City or Locality?
  [Unknown]:  Brisbane
What is the name of your State or Province?
  [Unknown]:  QLD
What is the two-letter country code for this unit?
  [Unknown]:  AU
Is CN=your.url.here, OU=IT, O=your.url.here, L=Brisbane, ST=QLD, C=AU correct?
  [no]:  yes

Enter key password for <your.url.here>
        (RETURN if same as keystore password):

Turn the keystore into a X.509 certificate

$ keytool -export -alias ${URL} -keystore ${URL}.keystore -rfc -file ${URL}.cert
Enter keystore password:  changeit
Certificate stored in file <your.url.here.cert>

Delete existing trusted certificate

$ keytool -delete -alias ${URL} -file ${URL}.cert  -keystore /opt/java/jre/lib/security/cacerts  -storepass changeit

Import the certificate into cacerts – JRE trusted certificates

$ keytool -import -alias ${URL} -file ${URL}.cert  -keystore /opt/java/jre/lib/security/cacerts  -storepass changeit
Owner: CN=your.url.here, OU=IT, O=your.url.here, L=Brisbane, ST=QLD, C=AU
Issuer: CN=your.url.here, OU=IT, O=your.url.here, L=Brisbane, ST=QLD, C=AU
Serial number: 44ab628c
Valid from: Wed Jul 05 01:56:12 CDT 2006 until: Tue Oct 03 01:56:12 CDT 2006
Certificate fingerprints:
         MD5:  EC:76:01:04:7F:FC:21:CC:A8:41:AD:86:C8:B2:D5:6D
         SHA1: 2D:FD:7C:56:65:70:36:1B:1D:71:09:41:84:98:E6:8E:89:18:BC:18
Trust this certificate? [no]:  yes
Certificate was added to keystore


If you replaced an existing certificate you will need to restart Tomcat.

Differences in syntax between mysql and mysqltest

As I wrote earlier in Using the MySQL Test Suite I found an issue with using the current MySQL Sakila Sample Database as a test with mysqltest.

I was running an older version of 5.1.7 beta so I figured the best course of action was to upgrade to 5.1.11 beta.

Well the problem still exists, and I found that the cause was due to the syntax of the DELIMITER command. Assuming the creation of the schema tables, here is an example of what I found.

Running in an interactive mysql session the following works

DROP TRIGGER ins_film;
DELIMITER ;;
CREATE TRIGGER `ins_film` AFTER INSERT ON `film` FOR EACH ROW BEGIN
    INSERT INTO film_text (film_id, title, description)
        VALUES (new.film_id, new.title, new.description);
  END;;
DELIMITER ;

On a side note, why does DROP TRIGGER not contain the IF EXISTS syntax like DROP DATABASE and DROP TABLE. It’s a shame, it’s a nice SQL extension that MySQL provides.

Now running the same SQL in a mysqltest test (with appropiate create tables) produces the following error

TEST                            RESULT
-------------------------------------------------------
sakila-trigger                 [ fail ]

Errors are (from /opt/mysql/mysql-test/var/log/mysqltest-time) :
mysqltest: At line 53: Extra delimiter "" found
(the last lines may be the most important ones)

It seems, that the DELIMITER command within mysql accepts a line terminator as a statement terminator, while in mysqltest, this is not so.
The solution is to be more strict in the mysqltest, using the following syntax, note the first and list DELIMITER lines are terminated by the appropiate delimiter in use.

DELIMITER //;

CREATE TRIGGER `ins_film` AFTER INSERT ON `film` FOR EACH ROW BEGIN
  INSERT INTO film_text (film_id, title, description)
  VALUES (new.film_id, new.title, new.description);
END//

DELIMITER ;//

Surprisingly, this syntax does then not work in mysql? Is this a bug? I’m note sure, perhaps somebody could let me know.

Using PBXT 0.9.5

Paul has released Version 0.95 of his PBXT MySQL Transactional Storage Engine.

Here is what I did to get it operational under CentOS 4.3.

su -
useradd pbxt
su - pbxt
wget http://www.primebase.com/xt/download/mysql-4.1.16-pbxt-0.9.5.tar.gz
tar xvfz mysql-4.1.16-pbxt-0.9.5.tar.gz
cd mysql-4.1.16-pbxt-0.9.5
./configure --with-debug=full --with-pbxt-storage-engine --without-query-cache --with-mysqld-user=pbxt  --prefix=/home/pbxt/mysql
make
make install
scripts/mysql_install_db
cd /home/pbxt/mysql
./bin/mysqld_safe --user=pbxt --basedir=/home/pbxt/mysql &

Now, lets test and confirm PBXT is operational.

bin/mysql -uroot mysql
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 1 to server version: 4.1.16-pbxt-0.9.5-debug

Type 'help;' or 'h' for help. Type 'c' to clear the buffer.

mysql> show engines;
+------------+---------+------------------------------------------------------------+
| Engine     | Support | Comment                                                    |
+------------+---------+------------------------------------------------------------+
| MyISAM     | DEFAULT | Default engine as of MySQL 3.23 with great performance     |
| HEAP       | YES     | Alias for MEMORY                                           |
| MEMORY     | YES     | Hash based, stored in memory, useful for temporary tables  |
| MERGE      | YES     | Collection of identical MyISAM tables                      |
| MRG_MYISAM | YES     | Alias for MERGE                                            |
| ISAM       | NO      | Obsolete storage engine, now replaced by MyISAM            |
| MRG_ISAM   | NO      | Obsolete storage engine, now replaced by MERGE             |
| InnoDB     | YES     | Supports transactions, row-level locking, and foreign keys |
| INNOBASE   | YES     | Alias for INNODB                                           |
| BDB        | NO      | Supports transactions and page-level locking               |
| BERKELEYDB | NO      | Alias for BDB                                              |
| NDBCLUSTER | NO      | Clustered, fault-tolerant, memory-based tables             |
| NDB        | NO      | Alias for NDBCLUSTER                                       |
| EXAMPLE    | NO      | Example storage engine                                     |
| ARCHIVE    | NO      | Archive storage engine                                     |
| CSV        | NO      | CSV storage engine                                         |
| BLACKHOLE  | NO      | Storage engine designed to act as null storage             |
| PBXT       | YES     | Super fast transactional, row-level locking engine         |
+------------+---------+------------------------------------------------------------+
18 rows in set (0.00 sec)

Now to take if for a spin with the MySQL Sakila Sample Database. At this time I need a modified version of the schema, as the sample database is designed for Version 5.

Download sakila.mysql4 script.
Download sakila.pbxt.sql script.

bin/mysql -uroot mysql
mysql>  source sakila.mysql4
mysql> source  sakila.pbxt.sql
# NOTE: Alter table of film_text fails as expected due to FULLTEXT index
mysql> show create table actor;
mysql> exit;

A few warnings in my sakila.mysql4 script I’ll need to review, but it looks ok from an installation step. Now back to my tests that I’m writing to use the Sakila Sample Database which I was working on, so I can really test the ACID state and performance of PBXT.

Using the MySQL Test Suite

MySQL provides two different tools to test the MySQL Server with SQL statements. One is mysqltest and in 5.1 mysqlslap. Both of these tools have quite different purposes. This is a quick review of the usage of mysqltest.

Current Usage

Under Linux deploys, the README in the mysql-test directory gives you all the information you need to know to run.

To run the full test suite.

cd /opt/mysql/mysql-test
 ./mysql-test-run

Rather easy, it does take some time, and I was surprised to find a number of tests marked as ‘skipped’. The general purpose of having tests in a product is to provide coverage of software functionality, and tests should always be forward compatible, unless they are no longer applicable and should be removed. I wonder why the time wasn’t taken to ensure they work in each release.

Should you already have a mysqld running, you can also use this using the -extern syntax to run specific tests against the server. For Example.

./mysql-test-run --extern alias analyze

There are over 700 tests, so this approach is not practical for all tests. It is noted in the README that some tests can’t run in extern mode.
Looking a bit closer at both the server configuration and a running test sheds some light on the types of parameters that are used, and how mysqltest operates.

/opt/mysql/bin/mysqld --no-defaults --server-id=1 --basedir=/opt/mysql --port=9306 --port-open-timeout=380 --local-infile
--exit-info=256 --core --datadir=/opt/mysql/mysql-test/var/master-data --pid-file=/opt/mysql/mysql-test/var/run/master.pid
--socket=/opt/mysql/mysql-test/var/tmp/master.sock --log=/opt/mysql/mysql-test/var/log/master.log
 --character-sets-dir=/opt/mysql/share/mysql/charsets --default-character-set=latin1 --tmpdir=/opt/mysql/mysql-test/var/tmp
--language=/opt/mysql/share/mysql/english/ --innodb_data_file_path=ibdata1:128M:autoextend --open-files-limit=1024
--log-bin-trust-function-creators --loose-binlog-show-xid=0 --rpl-recovery-rank=1 --init-rpl-role=master --key_buffer_size=1M
--sort_buffer=256K --max_heap_table_size=1M --log-bin=/opt/mysql/mysql-test/var/log/master-bin --user=root

/opt/mysql/bin/mysqltest --no-defaults --socket=/opt/mysql/mysql-test/var/tmp/master.sock --database=test --user=root
--password= --silent -v --skip-safemalloc --tmpdir=/opt/mysql/mysql-test/var/tmp --port=9306 -R r/count_distinct3.result

Extending MySQL Test Suite

That’s all nice, but the power of this infrastructure is you can incoporate your own tests. This may be beneficial if you have complex statements and use bleeding edge versions of MySQL. Indeed, it’s another approach to being able to provide to MySQL reproducable test cases if you have a problem.

There are basically 2 steps to incoporating your own tests.

  1. Create a test script in the t subdirectory.
  2. Create a test results file in the r subdirectory.

Using the MySQL Sakila Sample Database as a test case, I created a test with the default sakila-schema.sql and sakila-data.sql files. I did need to replace the DROP and USE DATABASE commands and add a set of DROP TABLE statements.

cd /opt/mysql/mysql-test
vi t/sakila.test
./mysql-test-run --record sakila
more r/sakila.results

Example Results

./mysql-test-run sakila

Stopping master cluster
Installing Test Databases
Removing Stale Files
Installing Master Databases
running  ../bin/mysqld --no-defaults --bootstrap --skip-grant-tables     --basedir=.. --datadir=mysql-test/var/master-data --skip-innodb --skip-ndbcluster --skip-bdb
Installing Master Databases 1
running  ../bin/mysqld --no-defaults --bootstrap --skip-grant-tables     --basedir=.. --datadir=mysql-test/var/master-data1 --skip-innodb --skip-ndbcluster --skip-bdb
Installing Slave Databases
running  ../bin/mysqld --no-defaults --bootstrap --skip-grant-tables     --basedir=.. --datadir=mysql-test/var/slave-data --skip-innodb --skip-ndbcluster --skip-bdb
Manager disabled, skipping manager start.
Loading Standard Test Databases
Starting Tests

TEST                            RESULT
-------------------------------------------------------
sakila                         [ pass ]
-------------------------------------------------------

Ending Tests
Shutting-down MySQL daemon

Master shutdown finished
Slave shutdown finished
All 1 tests were successful.

Now, of surprise with my present version of 5.1.7 beta, a number of statements from the sakila-schema failed.
You can download my working version of the sakila.test from above here. The first failure was:

TEST                            RESULT
-------------------------------------------------------
sakila2                        [ fail ]

Errors are (from /opt/mysql/mysql-test/var/log/mysqltest-time) :
mysqltest: At line 220: query ';
CREATE TRIGGER `ins_film` AFTER INSERT ON `film` FOR EACH ROW BEGIN
INSERT INTO film_text (film_id, title, description)
VALUES (new.film_id, new.title, new.description);
END;;
CREATE TRIGGER `upd_film` AFTER UPDATE ON `film` FOR EACH ROW BEGIN
IF (old.title != new.title) or (old.description != new.description)
THEN
UPDATE film_text
SET title=new.title,
description=new.description,
film_id=new.film_id
WHERE film_id=old.film_id;
END IF;
END;;
CREATE TRIGGER `del_film` AFTER DELETE ON `film` FOR EACH ROW BEGIN
DELETE FROM film_text WHERE film_id = old.film_id;
END;;
DELIMITER ;
--' failed: 1065: Query was empty
(the last lines may be the most important ones)

Now is the time to upgrade to the latest 5.1.11 beta to confirm operations and isolate the errors into smaller tests, and hopefully all is well, if not, I’ve got something to contribute back.

QMail Admin

For some reason my vqadmin failed to create a new mail domain?? I know I had some /home permissions problems, but that was quite some time ago, perhaps I haven’t added a domain for mail in that time. Anyway, time to refresh my skills and software.

vqadmin

Home Page

cd /src
wget http://www.inter7.com/vqadmin/vqadmin-2.3.2.tar.gz
tar xvfz vqadmin-2.3.2.tar.gz
cd vqadmin-2.3.2
./configure --enable-cgibindir=/home/vqadmin/cgi-bin --enable-htmldir=/home/vqadmin/www
make && make install-strip
vi /opt/httpd/conf/httpd.include

        deny from all
        Options ExecCGI
        AllowOverride AuthConfig
        Order deny,allow

vi /home/vqadmin/cgi-bin/vqadmin/.htaccess
AuthType Basic
AuthUserFile /home/vqadmin/.passwd
AuthName vQadmin
require valid-user
satisfy any
chown nobody /home/vqadmin/cgi-bin/vqadmin/.htaccess
chmod 644 /home/vqadmin/cgi-bin/vqadmin/.htaccess
htpasswd -bc /home/vqadmin/.passwd admin admin_password
chmod 644 /home/vqadmin/.passwd
apachectl stop
apachectl start

All fixed. For a step by step guide to all Qmail software I recommend http://qmailrocks.org

Using Cascade in Foreign Keys

For those that are using a Referential Integrity based Storage Engine in MySQL, the use of Foreign Keys can provide an extra level of data integrity within your application. There are of course caveats to this statement, such as correct design, proper transactional use etc, but that’s a topic for another time. The use of CASCADE has an effect on this.

So, just a quick refresher, using the MySQL Sakila Sample Database (if you don’t have it, get it. This can be a valuable tool in many other areas). Let’s first check out the Schema Diagram. We will look at the first relationship in the top left corner of the diagram.

There is a defined relationship with the actor and film_actor tables. An actor record can belong to one or more film_actor records, and a film_actor record must have one corresponding actor record.

Schema Definition

CREATE TABLE actor (
  actor_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
  first_name VARCHAR(45) NOT NULL,
  last_name VARCHAR(45) NOT NULL,
  last_update TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY  (actor_id),
  KEY idx_actor_last_name (last_name)
)ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE film_actor (
  actor_id SMALLINT UNSIGNED NOT NULL,
  film_id SMALLINT UNSIGNED NOT NULL,
  last_update TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY  (actor_id,film_id),
  KEY idx_fk_film_id (`film_id`),
  CONSTRAINT fk_film_actor_actor FOREIGN KEY (actor_id)
         REFERENCES actor (actor_id) ON DELETE RESTRICT ON UPDATE CASCADE,
  CONSTRAINT fk_film_actor_film FOREIGN KEY (film_id)
         REFERENCES film (film_id) ON DELETE RESTRICT ON UPDATE CASCADE
)ENGINE=InnoDB DEFAULT CHARSET=utf8;

Explanation

The above shows the CREATE TABLE syntax as provided in Version 0.8 of the Sakila Sample Database. Looking specifically at the constraint we wish analyse.

  CONSTRAINT fk_film_actor_actor FOREIGN KEY (actor_id)
         REFERENCES actor (actor_id) ON DELETE RESTRICT ON UPDATE CASCADE,

The FOREIGN KEY for a specific column REFERENCES an appropiate table and column, and includes optional UPDATE and DELETE actions.

The ON DELETE RESTRICT is indeed optional as this is the default operation, but is a good practice for both readibility and future compatibility.
The ON UPDATE CASCADE in this case, is indeed unwarranted as the actor primary key is an AUTO_INCREMENT column and should never be updated. By good design definition, a PRIMARY KEY should never be updated in a table, and then should be set to RESTRICT. If however you ever wish to update a primary key value in RESTRICT mode, you can’t as to first UPDATE children before the parent is to set the value to an unknown value that values validation. The best approach here is always use AUTO_INCREMENT primary keys, and never update the values.

Use

So let’s take the current implementation for a spin to see how the default RESTRICTED implementation operates.

mysql> select count(*) from film_actor where actor_id=1;
+----------+
| count(*) |
+----------+
|       19 |
+----------+
1 row in set (0.05 sec)
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)
mysql> delete from actor where actor_id=1;
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails
    (`sakila/film_actor`, CONSTRAINT `fk_film_actor_actor` FOREIGN KEY (`actor_id`) REFERENCES `actor` (`actor_id`)
   ON UPDATE CASCADE)
mysql> rollback;
Query OK, 0 rows affected (0.06 sec)

As you can see, an attempt to delete an actor fails due to a foreign key constraint failure.

The correct implementation for this design is to delete child relationships first. For example.

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)
mysql> delete from film_actor where actor_id=1;
Query OK, 19 rows affected (0.00 sec)
mysql> delete from actor where actor_id=1;
Query OK, 1 row affected (0.00 sec)
mysql> rollback;
Query OK, 0 rows affected (0.06 sec)

Let’s now try CASCADE.

mysql> alter table film_actor drop foreign key fk_film_actor_actor;
Query OK, 5462 rows affected (0.78 sec)
Records: 5462  Duplicates: 0  Warnings: 0
mysql> alter table film_actor add CONSTRAINT `fk_film_actor_actor` FOREIGN KEY (`actor_id`) REFERENCES `actor` (`actor_id`) ON DELETE CASCADE;
Query OK, 5462 rows affected (0.69 sec)
Records: 5462  Duplicates: 0  Warnings: 0
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)
mysql> select count(*) from film_actor where actor_id=1;
+----------+
| count(*) |
+----------+
|       19 |
+----------+
1 row in set (0.01 sec)
mysql> delete from actor where actor_id=1;
Query OK, 1 row affected (0.00 sec)
mysql> select count(*) from film_actor where actor_id=1;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)
mysql> rollback;
Query OK, 0 rows affected (0.00 sec)

As you can see, by changing the definition to CASCADE, the requirement to delete all child rows is nullified.

Considerations

When designing I try to avoid the use of CASCADE. While it has it’s benefits, it is a “catch” by the database, and can hide poor design and coding techniques. In can however in more heavily loaded systems, provide a performance improvement, as additional commands for deleting don’t incur the network communications overhead. I’ll need to add this as a note to my performance testing guidelines to see what effect this has.

Another benefit on a more strict approach is MySQL for example currently does not fire triggers resulting on cascading constraint operations.

Try it yourself

As you can see it’s easy to use. To try it yourself.

  • Use an appropiate storage engine, e.g. InnoDB. You can use ALTER TABLE [table] ENGINE=InnoDB; on your tables.
  • Add an index to the target (foreign key) column, your source (primary key) will have an index by definition of being a primary key.
  • Ensure your source and target columns are the same data type
  • Create the appropiate constraint.

While the MySQL Manual refers to both the source (primary) and target (foreign) columns must have Indexes, I’ve found this not to be the case in 5.1. An ALTER TABLE will auto create the target (foreign) columns index if it doesn’t exist.

Just note, traps for young players. Both the source (primary) and target (foreign) columns must have the identical datatype definition. A common problem I’ve seen is that UNSIGNED is used for a Primary Key definition and not for a foreign key definition.

References
Wikipedia definition of “Referential Integrity”
MySQL Manual Page – FOREIGN KEY Constraints
MySQL Sakila Sample Database Forum
MySQL Forge Wiki on Sakila Sample Database

Guidelines for managing embedded external project dependencies

I’ve yet to find any Java project that doesn’t have dependancies on some other Open Source external libraries. I’ve yet to find a Java project that manages these external dependencies appropiately for support and integration at an enterprise level.

As with most projects, understanding an applying sound principles that scale will help you at a later date, and generally the cost of implementation is minimual at the start, but of course becomes more expensive when it’s really needed. The classic case is Version Control. For over 10 years, even on small single developer projects, I’ve used Version Control, it should be taught at university as an introduction to good programming design, it would greatly benefit software development and maintenance.

Back onto the topic of hand. Let’s use a moderate Java Web Based application, and for the purposes of this discussion the following Open Source external libraries are incoporated. Log4J, JUnit, Canoo WebTest, MySQL JDBC, Apache Commons (Collections, DHCP, Pool, HTTPClient, Taglibs Mailer). I could continue, but this will suffice for the demonstration.

It’s very easy for your project to include the appropiate jar’s such as (log4j.jar, junit.jar, commons-pool.jar etc), however this is where support and integration with other products fall down.

A Controlled Approach

You need to keep a seperate repository (under source code control of course) of your external libraries, and this becomes the source across all corporate projects. This is to include the following for each library:

  • The actual deployed jar
  • The matching source code of the deployed jar
  • Java Documentation of deployed jar

Versioning

Log4J is an example of an Open Source project that does version their jars, will many other open source projects do not. Why don’t they? Well one reason is to enable people to upgrade easily but simply overriding existing versions, and processes that have specific CLASSPATH’s are not affected. Generally today, implementations of software include all jar’s within a specified directory so I don’t see the problem.
Log4J gives in this example a log4j-1.2.12.jar for deployment purposes. When libraries do not include a version number, the are to be specifically added. This adds another small delemma of standards. The general practice is to use the hyphen ‘-‘. followed by the product version using the full-stop ‘.’, however there are projects that don’t follow this.

Version Recording

So now we have for each external library, an appropiately versioned jar, and matching source and documentation. This is the initial baseline. What’s needed is a simple HTML index that manages this information for use. The Index should include:

  • Product Name
  • Product URL
  • Repository Version
    • Version Number
    • Version Date
    • Download Date
  • Latest Version
    • Version number
    • Date
    • Comment

You may ask, why do you record the Latest Version, when the practice should be to always get the latest version. JUnit is a good example, the present version 4.x, requires a JDK 1.5.x deployment, and if your application is running only 1.4.x, then you can only use older 3.8.x versions.s mentioned earlier,

Management

Having an index of external libraries is one thing, correct use and management is the most important step. Let’s assume we have taken the time to download and document the required libraries from our example, and everything has been deployed into our first project.

Now, a 2/3 month task of checking for updated versions can be scheduled. Withing this process, newer versions can be downloaded and recorded appropiately. In our example, Log4J now has 1.2.13. Updating the external libraries repository is the simple part, the next step is to notify all coporate projects of the new version, and to encourage uptake. This may not always occur in a timely manner, but with at least this baseline in place and when there are issues, standardisation on the known coporate version is the first step.

Dependencies

Within each project libaries, a readme that details which versions of which external jars are included andwhen they were last updated from the repository should be done. Noting this information with the both the external libraries repository and the project repository provides a paper trail. In addition, should there be any exceptions, this is the place this information can be reported.

External Projects

Canoo WebTest is a good example of an external project that also includes other external libraries such as Log4J, JUnit, HtmlUnit, NekoHtml.
Problems arise when these products may use and implement older or unknown versions of libraries.

Internal Projects

Having internal projects that are dependent on other internal projects is nothing new in a large corporate enterprise. The problem arises when a spaghetti of undocumented dependancies causes a management nightmare. Let’s take this real life example.

Product A has included jars of Product B, Product C, Product D and Product E. The Product C actually has a different version of Product D embedded within in. Product D which now is included twice also includes Product E, so there are now three copies of this, all are different in size, and all a binary only with no version numbers, and no corresponding matching source code. Does this sound bad? Well it is. How it ever worked is still amazing.

This mess could have been managed first with Version Control (a basic 101 in software development), and an appropiate management of external libraries, and a similar approach to internal libraries.

An Example

This is a great example to highlight the cost in lack of appropiate management. I was supporting an existing large scale project (1000+ users) (let’s call this Product A), and the integration of a new project (let’s call this Product B) had been passed from the development team for implementation, testing and release. A threaded process, it would simply just hang after some initialisation, no notice, no errors, just nothing. Not withstanding that something should have been better reported for the errors. Due to 7 possible log files between the software application and the application server, nothing was reported, but that’s another topic.

The final result was Product B had introduced the use of org.apache.log4j.Logger.trace(), a new more granular logging then debug(). The appropiate Log4J jar had been included in the product, and this was Version 1.2.12. However, Product A, which was using Product B, was bundled with an earlier version of Log4J, Version 1.2.8, and this version didn’t support this new method.

While it took a few hours of debugging to find this problem, it was made easier because at least these jar’s were version, of the 20-30 jars across products only 3-4 were versioned. Similar problems with QName and XMLBeans unamed jars prior to this took days to resolve (indeed one had to be worked around as it couldn’t be resolved).

A further complication in this process was when Product B was introduced. This was developed and built under Linux, while Product A was still being maintained under Windows. From the experience of Integration is was found that the order of loading within the classloader of a commerical application server differed between operating systems.

What constitutes a good error message to the user?

Today, will go down in my professional history as quite possibly the lowest I would ever think of a software developer. I’ve carefully avoided the term “fellow coder”, speaking of a IT industry sticking by fellow IT people, but not today.

I presently support an existing production system, some 1000+ users that’s been in place around 3 years in which I’ve had no prior involvement until recently. Integration with other external parties within the system have provided an esclation in errors being reported in this external communication, and lack of adequate feedback to the user is another topic. Email is the means of reporting administratively of user errors, again another topic of issue. Within these emails, which are almost impossible to manage due to the limited internal GUI only toolset and lack of access to actual email account files to automate scripting (yet another topic? Do you see a trend here), is some relevent information regarding the transaction, and then most importantly the error message. The thing I need to investigate the problem.

The line reads (removing some stuff):


Error Code: 234567892

Ok, well a little cryptic, but surely you can work out from the external system what this means. Investigation of some more errors, in the mail GUI product, yet another series of open windows (you can’t view messages like a regular mail client with a summary list and a detail panel), provides for a trend in the error messages:


Error Code: 1235492567
Error Code: -434783134
Error Code: 34345199

The trend being there is none. Of course today by mid morning the email error count is into the hundreds, and I’m none the wiser. Well time to closely investigate the code management (as I’ve already contacted the external party, and asked if I can provide some error codes to receive greater information).

The following constitutes the two lines of code taken in the determination of the error messages I’ve shown so far. Note, this code takes the external system response, and then “attempts to determine usefull error content for presentation back to the user”.


errorNo = new Random().nextInt();
error = “Error Code: ” + errorNo;

Now while everybody laughed out loud, including fellow developers, DBA, IT manager, the Business Owners and Users (which can’t read code but could understand the first of these two lines I highlighted), and yes it really was funny in context with a bigger picture, but really it wasn’t funny at all. Some things make my blood boil, and this was one of them. With all the time lost between multitudes of people, users, call centre etc, I’d never felt a stronger conviction to hunt down the developer that wrote this.

The end of the story is after even trolling old CVS repository entries I was unable to piece sufficient information to determine the author. Most likely done pre version control, and then that trail leads to only a few names I’ve heard mentioned before.

I’d like to see somebody top that one!

Open Office Spell Checker

I don’t know why it just isn’t installed by default, but Open Office 2 didn’t install any dictionaries. What’s more depressing, is you go to do a spell check, and it simply states Spelling is Complete. There is no menas to install via Open Office.

What you have to do is download a document that runs a macro to enable you to choose and install the necessary dictionaries. Weird, but it works. And the magic link is:

http://wiki.services.openoffice.org/wiki/Dictionaries

It was http://lingucomponent.openoffice.org/dictpack.html

FireFox's Live Bookmarks

I’m just about to launch a new project I’ve been working on in the past week. It has a RSS feed, and I wanted to ensure that within FireFox, this could be picked up as a live bookmark. This alone is a very cool feature. Using my Blog for reference, the following code is used.


	<link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="http://blog.arabx.com.au/?feed=rss2" />
	<link rel="alternate" type="text/xml" title="RSS .92" href="http://blog.arabx.com.au/?feed=rss" />
	<link rel="alternate" type="application/atom+xml" title="Atom 0.3" href="http://blog.arabx.com.au/?feed=atom" />

So adding a RSS2 live bookmark link is as simple as a header tag.

References
Live Bookmarks – Mozilla Description
Good Introduction Tutorial

Linux One Liner – Finding Stuff

Let’s say you created a file in your home directory but can’t work out which directory you put it in.


$ find ~ -name somefile.txt

You can replace ~ (tilda) with a directory e.g. / (slash) for search all locations on your system.

Let’s say you want to find all the JPEG’s you have.

$ find ~ -name "*.jpg"

Now to improve on this, I know I put a JPEG somewhere in the past few days, give me just the files from the past 3 days.

$ find . -name "*.jpg" -mtime -3

And what if you only wanted files greater then 2MB

$ find . -name "*.jpg" -mtime -3 -size +2M

If you want to look at a more detailed listing of the files found, like the format you are familar with using ls, try this.

$ find . -name "*.jpg" -mtime -3 -exec ls -l {} ;

You can find more then files, lets find all the directories you have.

$ find ~ -type d

I haven’t added it, but historically you needed to add -print on the end to print the results, seems nowadays this is optional.

I briefly used the -exec option above, I used it for various purposes. Here are a few.

$ find /backup-location -name "backup.*.tar.gz" -mtime +3 -print -exec rm -f {} ;
$ find . -name CVS -exec rm -rf {} ;

The first I run against my backup directory, that removes the online backups older then 3 days. Of course I’ve also got offline backups.
The second is when I checkout stuff from CVS and I want to prune all the CVS information. NOTE: Using the rm -rf command is very dangerous, you should only use this when you know your stuff. Used in the wrong way you delete everything, if you don’t have backups, there ain’t any UNDO in Linux. Also if you do it as root, you can effectively kill your installation in a split second.

There are other commands that perform various level of finding (e.g. commands via path) and other various stuff. A topic for another time, but to entice you.


$ which find
$ whereis find
$ locate find

Linux One Liner – Parsing long HTML urls

Ever wanted to look at a long HTML URL more easily, say to investigate a parameter. Here is a search from MapQuest.

http://www.mapquest.com/maps/map.adp?formtype=address&addtohistory=&address=10%20Market%20St&city=San%20Francisco&state=CA&zipcode=94111%2d4801&country=US&geodiff=1


$ echo "[insert url here]" | | tr "&?" "n"

This produced for the above URL the following output.

http://www.mapquest.com/maps/map.adp

formtype=address
addtohistory=
address=10%20Market%20St
city=San%20Francisco
state=CA
zipcode=94111%2d4801
country=US
geodiff=1

The Translate command tr does however replace both the & and ? characters. There are of course many more approaches like.


echo "[insert url here]" | sed -e "s/&/\n/g" -e "s/?/\n/g"

You can easily preserve the & and ? characters extending the syntax with

echo "[insert url here]" | sed -e "s/&/\n&/g" -e "s/?/\n?/g

This produces.

http://www.mapquest.com/maps/map.adp

?formtype=address
&addtohistory=
&address=10%20Market%20St
&city=San%20Francisco
&state=CA
&zipcode=94111%2d4801
&country=US
&geodiff=1

Now don’t get me started with the awk command. One of my popular books is Sed & Awk. If you do any detailed Shell scripting, this is a very handy guide.

Linux One Liner – Security

Here are a few useful one liners for Linux Security. View current packet filtering rules. (i.e. what can and can’t access your computer.

$ iptables -L

On older distros, iptables may not be in place. Try ipchains. A good reference and tools on iptables can be found at www.iptablesrocks.org.

Identity open ports on your installation using the Network exploration tool and security scanner.


$ nmap -p 1-65535 localhost

On my computer this returned

Starting nmap 3.70 ( http://www.insecure.org/nmap/ ) at 2006-06-11 12:22 EST
Interesting ports on lamda.arabx (127.0.0.1):
(The 65525 ports scanned but not shown below are in state: closed)
PORT STATE SERVICE
22/tcp open ssh
25/tcp open smtp
111/tcp open rpcbind
139/tcp open netbios-ssn
445/tcp open microsoft-ds
631/tcp open ipp
901/tcp open samba-swat
8005/tcp open unknown
32769/tcp open unknown
34315/tcp open unknown

That’s a cause for a bit of concern. Will need to look into that more.

Looking into more detail, I know what runs samba-swat but let’s confirm.


$ fuser -n tcp 901

This provides a confirmation and the Process id of the process using this port. A more susync output would be.

$ ps -ef | grep `fuser -n tcp 901 | tail -1 | cut -d: -f2` | grep -v grep

This gives me.

root 3356 1 0 Jun10 ? 00:00:00 xinetd -stayalive -pidfile /var/run/xinetd.pid

Which is exactly right, Samba Swat (the web interface for Samba) which you access at http://localhost:901 is configured using xinetd.

Now to investigate some ports I didn’t know were open.

Linux One Liner – Using the manual

For users of Linux regardless of the skill level, using the OS manual is invaluable. Frank gives an example using crontab at Viewing a specific version of a man page, but as with Linux there is always more then one way to skin a cat.

To view a man page of a command e.g. du.

$ man du

The Unix Manual is generally broken down into 9 sections, and sometimes a manual page is in multiple sections. These section are:

  • Section 1 – Commands
  • Section 2 – System Calls
  • Section 3 – Library Calls
  • Section 4 – Special Files
  • Section 5 – File Formats and Conversions
  • Section 6 – Games for Linux
  • Section 7 – Macro Packages and Conventions
  • Section 8 – System Management Commands
  • Section 9 – Kernel Routines

As in Franks example, crontab is in both Section 1 and 5. Crontab tab the Linux Command, and the file format used for crontab. To get access to the later.

$ man -s 5 crontab

Frank made reference to a syntax of man crontab.5 which didn’t work in my distro, so again, different implementations may be possible.

Say you remember the command associated with cron but not the full name. You can search the man pages with.

$ man -k cron

This produced in my distro.

/etc/anacrontab [anacrontab] (5) - configuration file for anacron
anacron (8) - runs commands periodically
cron (8) - daemon to execute scheduled commands (ISC Cron V4.1)
crontab (1) - maintain crontab files for individual users (ISC Cron V4.1)
crontab (5) - tables for driving cron (ISC Cron V4.1)
hinotes (1) - Syncronize your Hi-Notes database with your desktop machine. Hi-Notes must be installed on your Palm handheld (and at least one entry must exist within Hi-Notes)
read-todos (1) - Syncronize your Palm ToDo application's database with your desktop machine

Of course you should not discount that a manual page exists for the man command.

$ man man

Some of Mike's Useful Sites

I haven’t seen any blogging in my normal readings from Mike lately, so taking a quick look directly at his blog. He has certainly been busy in other areas. Some great tibbits and websites.

Linux One Liner – Calculating Used Diskspace

You can easily see the state of diskspace used with the command.


$ df

However, often you want to know where most of the diskspace is being taken. A high level summary can be performed with.

$ du -k -s /* | sort +0nr -1

Producing results like.

23450208        share
9369212 home
3803504 usr
2395876 var
2015380 opt
920121  proc
815476  src
...

A more indepth review of the worst offending directories can be done with.


$ du -k / | sort +0nr -1 | head -30

This view does however show all offending directories so you normally have to ignore the higher levels as the are inclusive of the more specific directories where the most diskspace is.

You get a result like

47642425        /
23450208        /share
9799580 /home
9153228 /home/rbradfor
8497152 /share/bittorrent
7065840 /share/bittorrent/Stargate.SG-1.Season.9
4986368 /home/rbradfor/vmplayer
4837136 /usr
3659200 /opt
2559836 /home/rbradfor/vmplayer/ultimateLAMP
2447692 /var
2426364 /home/rbradfor/vmplayer/ultimateLAMP-0.1
2377732 /usr/lib
2335428 /var/lib
2213440 /var/lib/vmware
2213432 /var/lib/vmware/Virtual Machines
2174928 /share/lib
2174912 /share/lib/vmware
2174896 /share/lib/vmware/Virtual Machines
1972900 /home/rbradfor/download
1945576 /var/lib/vmware/Virtual Machines/XP Pro Dell 5150
1868016 /share/UltimateLAMP
1604032 /usr/share
...

References

  • df – report filesystem disk space usage
  • du – estimate file space usage
  • sort – sort lines of text files

MySQL Ideas

Seems I have over time, thought of many ideas, jotted some notes on some, and even done some work, but everybody knows that “home projects” can take a long time.

Here are a few that have resurfaced over the past month, and I doubt I’ll ever get to them, or perhaps some other enterprising person has already done a similar thing. Of course most are for my own personal/professional gratification, but input from others of “great idea, when do we see it” could sway my interests.

INFORMATION_SCHEMA for MySQL Version 4

Why?

Well, quering the INFORMATION_SCHEMA is very cool, and long overdue for information gathering including statistics, schema definitions, schema version comparision tools etc. Of course there are concerns regarding the performance of using the INFORMATION_SCHEMA, and any design should significantly consider this limitation. The INFORMATION_SCHEMA is not physically represented as a normal MySQL database with tables and real data files, and then can be optimised appropiately, queries reference various sources of information and return the results at query time.

How would I achieve this? Here is an approach. The biggest issue would be what programming technology and non-Linux support.

  • Generate a physical INFORMATION_SCHEMA create SQL script from the most current database version. This would allow you to create a real database with real tables.
  • Population. Well at least some tables will be empty, but some work would be required to the population of other information. A first pass with some easy shell scripting and SHOW commands would easily populate information for SCHEMATA (SHOW DATABASES), TABLES (SHOW TABLES), COLUMNS (DESC [table]). Some are covered by the mysql schema. You get the picture. Of course some information will be more more difficult to populate easily.
  • Syncing. Populating is one thing, but if the information is not current then it becomes useless. Depending on your organisations management, in a controlled environment, simply re-running the population process when schema information is changed, or run nightly may work. A more brut force approach may be to look at filesystem timestamps of the database table definitions and trigger when changes occur, but I suspect this would a poor approach.

Storage Engines in use

I’ve often wondered, do people use X storage engine.

I would love to see plastered on the MySQL Home Page stats of 99,999,999 Downloads, 99,999,999 Installations reporting in, and then some breakdowns of database versions, storage engine usage, even demographics. This type of statistical information can be great, but can also be dangerous to competitors, or can be too detailed to become useful. Consider the objective it a general census.

I recently was introduced to Fotolog, and in the top left corner is Members, Photos, and Photos today. Of course now they have volume the numbers are impressive. You blink, do a refresh and they have changed.

Having a more general information goal approach, rather then detailed would be my guidelines. Here are some points.

  • Each MySQL installation is registered providing it a unique id. This is used for all collection and communication, and there is never any real reference back to organisation/name etc. How this works correctly is a little difficult, but would need to involve some checksuming of information like hostname,os,mysql version,maybe ip.
  • A script runs to generate basic statisical information, including unique installation id, present GMT date/time, MySQL version, master/slave, installation date, number of databases, breakdown of storage engines used per database/installation, total number of tables per database/installation.
  • The Information needs to be recorded, so it can be transmitted, and also used for historical comparision, best option would a flatfile on the filesystem.
  • Having an automated approach to then say email this information to a known server. There would need to be some means to have some authentication, and feedback to confirm success. Using email is not an ideal approach, but is a more readily available medium. Another easy means would be a webservices approach, but would require a more direct internet connection, while mail could overcome this means.
  • More advanced ideas would be to record information in and XML format, have a XML storage engine, and enable reports to be run against this collated information, so the System Adminstrator has at there finger tips historical details that are gathered. The benefit of an XML approach is this can be more easily collated with clever approaches with XSLT for example. The fallback would could be a CSV file, but the information being sent, may well need to be more structured for various reasons.
  • Of course if it was limited to Version 5 up, the benefits of stored procedures, events and UDF’s a lot of this could be implemented in MySQL specific technologies, but what about the bulk of installations pre Version 5, you would definitely what to include these.
  • My concerns would definitely be SPAM abuse, or false figures, so these points alone may nullify the process even being worth it.

This is just a broad stroke overview, I’ve got more detailed analysis of pro’s, con’s, example file layouts, and a means of collating the information and providing dynamic reporting on the server.

Of course there are a few ideas I have implemented, and I’m still passionate about and working on in that “spare time” we all never have.

ShowCase MySQL

A showcase of MySQL (via existing popular Open Source apps). I’ve done this via a VMWare Appliance called UltimateLAMP.

I did this for two reasons, first of course I wanted to easily show applications that could be used without any technical requirements (indeed the hardest part is installing the free VMware player), and second, if there was more penetration of possible MySQL options in my present home city of Brisbane, it may provide some MySQL work opportunities. Of course, this an application approach to demonstrating MySQL, rather then a MySQL features approach, and is dependent then on the community and these applications may be less then ideal in a large scale deployment or may not be of a quality to be placed with MySQL, but it’s one way to get leverage.

I would like some way to poll usage of MySQL within organsiations my home town, but it’s a very difficult approach, and I’m sure some organisations would not even know they are using MySQL. The process may also provide a means of providing a service, say quarterly updates of software (MySQL and open source products), customisations, but I think it would be a difficult sell. Having an army of people applying these ideals across a number of cities may provide a more marketable approach.

ACID Performance Benchmark

Leveraging the Apache project JMeter, which has all the infrastructure for handling theads, reporting, graphing, and volume load testing, I’ve written to enhancements for JMeter JDBC Sampler, the first was CALL support, enabling calls to stored procedures. The second is TRANSACTION support enabling a number of SQL statements to be execute in a true transaction, all pass or all fail.

These allow me to provide an application specific performance approach. Now MySQL has various means of testing, the mysql-test that comes with installations (I’ve got more on this for a later time), as well as mysqlslap (again, some more points on this at a later time). Both of these tools have benefits, but I wanted a more application specific approach. Using JMeter provides an overhead, but it provides a more realistic application approach, my goal is not to get the maximum thoughtput possible, being as close to the screws, but to provide a more realistic approach.

Of course, my objective is to use application specific SQL, the SQL you run, this would need to be gathered via various means. Information gathering including what types of transactions, the interactions, the volumes of use would be needed to create realistic approaches. The benefits of JMeter for example, is I can simulate various transactions, enable them to interact, enable them to have implied delays to simulate user use. These are some of the advantages over the current MySQL performance approaches.

Providing this level of simulation however is only half the requirement. This provides for a benchmark for performance, but what about validation. Sure the system can be used with 100 threads with X Insert Orders, and Y Select Orders, and Z Invoice Prints etc per thread, but what if the there were issues and errors.

Data Quality is a key interest and this level of benchmarking also has to have the extent of validating the results, and taking appropiate actions for failures. I’m planning also that this can be used by enabling Referential Integrity (via an appropiate storage engine) on an existing application and then confirming transactional state of queries within applications that are not presently transaction safe.

Presently, I using this approach to assist in the testing of the MySQL Community Storage Engine PBXT. I’m hoping in conjunction with the Sakila Sample Database and various data sets, that I can also publish comparision results with other Storage Engines.

Schema Designer Analyser

My background is database modelling and large systems design, and I’m writing a A MySQL Guide to Database Modelling, Design and Support (still under development). This is an online and very simple approach for open source development to improve it’s image in database modelling. Keeping things simple will help all those non-professionals out there.

To back up this article, which has a checklist for organisations to follow and then incoporate into standards, is a tool to do the analysis that the checklists requires.

I had hoped to initial develop this using the GRT portion of the MySQL Workbench, but I’ve decided to make that a secondary project for now, focusing on a tehnology of greater understanding, possibly Java. Of course the biggest issues is accurate gathering of schema information, so you can see why a INFORMATION_SCHEMA for MySQL version 4 would come in handy.

Conclusion

I’ve got a lot more ideas, and discussions with other MySQL peers rekind ideas I’ve thought of previously. I really should get to putting them all out there to see how many people think I’m crazy, or to find other crazy people to contribute, or even fund! One can dream.

Wrestling the Anaconda

I’ve decided to affectionally call the MySQL Workbench Product “The Anaconda”. It’s been a wrestle so far to get all the features and functionality I wanted in this product. Of course I’d much rather have seen this product at say version 0.5, or 0.6, as I would not feel as guilty towards my comments of a 1.0 product when I’m having issues. I also have great respect for Mike Zinner and this small team of GUI developers that are developing and supporting the MySQL GUI products. Nevertheless, here is my latest round of analysis of the product across various platforms.

Hardware

Machines

  1. Dell Inspiron 5150 P4 3.2GHz 1GB RAM, 80GB & 120GB HDD
  2. Generic Desktop PIII 866MHz
  3. Dell Inspiron 500 PIII 600MHz

Operating Systems

For the purposes of these tests I’m going to run multiple OS’s installed on seperate drives to attempt to isolate and reproduce problems.

  1. CentOS 4.3 (latest updates) and Ubuntu Dappa 6.06 RC (latest updates)
  2. CentOS 4.0 and RedHat Fedora Core 5
  3. Windows XP

MySQL Workbench Software Versions

I’m also going to be testing various versions from the MySQL Workbench downloads pages as ast June 2nd 2006. These include

  • Linux x86 generic RPM (statically linked against glibc 2.2.5) – Linux (x86, libc6) – mysql-workbench-1.0.6beta-1.i386.rpm
  • Linux x86 generic RPM (statically linked against glibc 2.2.5) – Red Hat Enterprise Linux 4 (x86) – mysql-workbench-1.0.6beta-1.rhel4.i386.rpm
  • Linux (non RPM package) – Linux (x86, glibc-2.2, “standard” is static, gcc) – mysql-workbench-1.0.6beta-linux-i386.tar.gz
  • Source downloads – Tarball (tar.gz) – mysql-workbench-1.0.6beta.tar.gz
  • Source downloads – Source RPM – mysql-workbench-1.0.6beta-1.src.rpm

The Problems

Presently I have a number of recent outstanding MySQL Bugs including Bug: #20106: Errors when starting the GRT Console, Bug: #20107: Export as PDF and Export an PNG fails and Bug: #20105: Unable to Export Model as Image.

Machine 1 – OS 1 – CentOS 4.3

su -
rpm -ivh mysql-workbench-1.0.6beta-1.i386.rpm
# Run as normal user
exit
mysql-workbench

So this successfully starts, however I’ve had many earlier failed attempts and build from source versions on this machine and platform.

Machine 1 – OS 2 Ubuntu Dapper 6.06

I haven’t had the chance to reboot my laptop and install my second drive that I’m currently building for this exercise. This is on my todo list for the next few days.

Machine 2 – OS 1 CentOS 4.0

$ uname -a
Linux marvin 2.6.9-5.0.3.EL #1 Sat Feb 19 18:26:49 CST 2005 i686 i686 i386 GNU/Linux
$ rpm -qa
pango-devel-1.6.0-7
pango-1.6.0-7
atk-1.8.0-2
atk-devel-1.8.0-2
glibc-2.3.4-2
glibc-headers-2.3.4-2
glibc-common-2.3.4-2
glibc-kernheaders-2.4-9.1.87
glibc-devel-2.3.4-2
glib2-2.4.7-1
glib2-devel-2.4.7-1
$ rpm -ivh mysql-workbench-1.0.6beta-1.i386.rpm
$ mysql-workbench
/usr/bin/mysql-workbench-bin: symbol lookup error: /usr/bin/mysql-workbench-bin: undefined symbol: pango_renderer_get_type

Realising this OS is now a little old, and having had issues compiling Workbench previously with requirements for new GTK packages as well as comments in bugs like ‘works with FC5′, I’ve taken to re-install a new OS.

Machine 2 – OS 2 Fedora Core 5

Using a fresh install selecting For Software Developers as the default installation environment.

su -
rpm -ivh mysql-workbench-1.0.6beta-1.i386.rpm
mysql-workbench
(mysql-workbench-bin:2784): GLib-GObject-CRITICAL **: gtype.c:2215: initialization assertion failed, use IA__g_type_init() prior to this function

(mysql-workbench-bin:2784): GLib-GObject-CRITICAL **: g_type_register_static: assertion `parent_type > 0' failed

(mysql-workbench-bin:2784): GLib-GObject-CRITICAL **: g_type_register_static: assertion `parent_type > 0' failed

(mysql-workbench-bin:2784): GLib-GObject-CRITICAL **: g_type_add_interface_static: assertion `G_TYPE_IS_INSTANTIATABLE (instance_type)' failed
...
(mysql-workbench-bin:2784): GLib-GObject-CRITICAL **: g_type_register_static: assertion `parent_type > 0' failed
/usr/bin/mysql-workbench: line 18:  2784 Segmentation fault      $PRG-bin $*

Seems I’m not the first person to experience this, already recorded as Bug #19595.

Checking the current rpm list against Compiling dependancies as a rough guide:

gtk2-devel-2.8.15-1
gtk2-2.8.15-1
glibc-2.4-4
glibc-devel-2.4-4
glib2-2.10.1-1
glib2-devel-2.10.1-1
pcre-6.3-1.2.1
libstdc++-4.1.0-3
libstdc++-devel-4.1.0-3
atk-devel-1.11.3-1
atk-1.11.3-1
pango-1.12.0-1
pango-devel-1.12.0-1

No libsigc++20, ,gtkmm, glibmm indeed these rpm’s are not even on the 5 FC5 Cd’s.
I think we need a pre-checker executable which checks the health of your OS to determine if all dependancies are in place.

Also of interest is the official GTK download site lists the current GTK as 2.8, yet the atk and pango versions that are included in FC5 aren’t available from the official download site???

tar xvfz mysql-workbench-1.0.6beta-linux-i386.tar.gz
mysql-workbench/bin/mysql-workbench

Same errors. Well at least that’s one thing.

I’m waiting on some feedback from the MySQL AB team before I go installing more software. In particular I’d like to get a run-time environment under FC5, rather then compiling from source.

Machine 3 – Windows XP

In summary, Workbench starts and works reasonably well here. Problem is the first model I created, I wanted to export, and the Export as Image option is disabled? Still waiting on a response here.

How I work

My work life is really fragmented at present, so I’ve decided a split approach in answer to Dave Rosenberg’s How I Work–what I have learned so far .

What is your role?
Support Developer providing sole support an internal client web based system (Java, Oracle) with 1000+ users producing >$1m daily revenue. Independent Database Consultant. Specialising in Database Modelling, Large Systems Design and Web Development Technologies. Strong proponent of Agile Development Methodologies (specifically Extreme Programming – XP).
What is your computer setup?
Dell Optiplex GX280 (P4, 1GB, 17″ CRT) running Redhat FC4.
(When I started only 2 months ago I got lumped with a similar speced machine running Windows XP, and upteen windows apps (like 4 different versions of TOAD for example) . This OS was contry to my understanding provided in the interview process.
Primary – Dell Inspiron 5150 (P4/3.2GHz, 1GB, 120GB, 15″ + 21″ CRT)
Secondary – Generic (P3/600MHz, 1GB, lots of HDD ~500GB, 17″ CRT)
Both with very impressive Saitek backlit keyboards (one red, one blue) Great at night. See image at bottom.
What desktop software applications do you use daily?
RedHat Fedora Core 4
Eclipse 3.1, FireFox 1.5, Thunderbird 1.5, WebSphere Application Server 5.1, J2DK 1.4.2, Oracle SQL Developer, Open Office 2, XMMS and SSH client (which I use most)
Due to legacy internal systems and support I also must run under Wine (Internet Exploder), and First Class (email client).
Not to stop there, I also must run under Citrix ICA Client apps (FocalPoint, Heat Call Logging, and Microsoft Word for internal forms that won’t work under OO2.) And before somebody suggests why not try VMWare or other clients, I have tried, but software like Focalpoint can’t install?
CentOS 4.3
FireFox 1.5, ThunderBird 1.5, Gaim, SSH, Skype, Open Office 2.
Maybe not all of these every day, but some combination of each day –> MySQL 4.1, MySQL 5.0, MySQL 5.1, MySQL Workbench, Eclipse 3.1, J2DK 1.4.2, J2DK 5.0, Apache Tomcat 5.0.28, Apache Httpd 2.0.53, JMeter.

Presently also configuring a new laptop drive running Ubuntu Dapper 6.06 RC. For the Record, Beta 2, Flight 6 and Flight 7 all failed.
What websites do you visit every day?
Internal Wiki (all day)
At Lunch Google Sci-Tech News
PlanetMySQL
Google Sci-Tech News
PlanetMySQL (if I haven’t already got to them).
What mobile device or cell phone do you use?
N/A NEC e606 3G phone (which I’ve had for probably 3 years now) operating on a 3G network with video calls, again for 3 years.
Do you use IM?
No. All access is blocked other then an internal Jabber server, that I use rarely, never for communication, just a cut/paste of command or syntax. Speaking of blocked, SSH access to my production servers is blocked, and even reading news like Skypes Call Out is blocked by WebSense. Extensively, however due to current employement policy, I’m very hamstrung unless before/after hours.
I use Gaim as I have AIM, MSN and Google Talk accounts, and Skype. My preference was always AIM, but as clients come and go, I’ve had to accumulate more accounts.
Do you use a VoIP phone?
No. Not at present, however for many years I worked with US clients and used Packet 8. Still have the hard phone somewhere.
I’ve also used Skype talk for one or one or conference calls. Of late in Australia to New Zealand and Singapore. Indeed, the quality to Singapore has been excellent, when living in the US, calls to Singapore on Skype were clearer then my Packet8 phone to US numbers.
Do you have a personal organization/time management theory?
Current contract employees a number of disjointed methods, which in observation just shows so much inefficency, it’s worth documenting just to highlight what not to do. We have daily team meetings (10 mins), each listing your top 2 daily tasks. Weekly we have to also submit weekly goals. Weekly combined meetings with another team where we again give weekly top 2 tasks. We use two seperate systems (with manual double entry) for work identification, one for call centre logging issues, and Bugzilla for software bugs, and enhancements. We also use XPlanner (again duplicating a lot of tasks) for time management.
With all this rigid structure, I am daily given either other work to do, or investigate, and in over 3 months, I would rarely end a week anywhere near where it was so described at the start of the week.
With all the technology possible, I actually do not have any electronic management gadget, never had. I use a combination of notebook, plain paper (usually for daily notes etc, which I discard regularly), a diary, and normally a lot of emails which I normally send to/from home.
Given that email is used so much, I basically use Draft Emails for any electronic notes.

Anything else?
Perhaps there is merit in How I work now, and How I’d like to work now.

Saitek Keyboard

A better VNC

I’ve been using VNCViewer from RealVNC under Linux to remote connect to an older machine running windows. Two reasons, I don’t need yet another screen on my desk, and I need windows to adequately test and use the MySQL GUI products, in particular MySQL Workbench.

I never realised there was a better way, particularly over a local network, a command called rdesktop which is capable of natively speaking Remote Desktop Protocol (RDP).

$ su -
$ yum install rdesktop

This did only give me version 1.3.1. The current version 1.4.1 available from the rdesktop Website does provide greater commands including the syntax I use.

su -
cd /src
wget http://optusnet.dl.sourceforge.net/sourceforge/rdesktop/rdesktop-1.4.1.tar.gz
tar xvfz rdesktop-1.4.1.tar.gz
cd rdesktop-1.4.1
./configure
make
make install

Example Usage

$ rdesktop -u <username> -p <password> -K -N -z -a 16 -x b -P 

Some additional common options include -d <domain> and -f for fullscreen.

Comand Syntax

$ rdesktop
rdesktop: A Remote Desktop Protocol client.
Version 1.4.1. Copyright (C) 1999-2005 Matt Chapman.
See http://www.rdesktop.org/ for more information.

Usage: rdesktop [options] server[:port]
   -u: user name
   -d: domain
   -s: shell
   -c: working directory
   -p: password (- to prompt)
   -n: client hostname
   -k: keyboard layout on server (en-us, de, sv, etc.)
   -g: desktop geometry (WxH)
   -f: full-screen mode
   -b: force bitmap updates
   -L: local codepage
   -B: use BackingStore of X-server (if available)
   -e: disable encryption (French TS)
   -E: disable encryption from client to server
   -m: do not send motion events
   -C: use private colour map
   -D: hide window manager decorations
   -K: keep window manager key bindings
   -S: caption button size (single application mode)
   -T: window title
   -N: enable numlock syncronization
   -X: embed into another window with a given id.
   -a: connection colour depth
   -z: enable rdp compression
   -x: RDP5 experience (m[odem 28.8], b[roadband], l[an] or hex nr.)
   -P: use persistent bitmap caching
   -r: enable specified device redirection (this flag can be repeated)
         '-r comport:COM1=/dev/ttyS0': enable serial redirection of /dev/ttyS0 to COM1
             or      COM1=/dev/ttyS0,COM2=/dev/ttyS1
         '-r disk:floppy=/mnt/floppy': enable redirection of /mnt/floppy to 'floppy' share
             or   'floppy=/mnt/floppy,cdrom=/mnt/cdrom'
         '-r clientname=': Set the client name displayed
             for redirected disks
         '-r lptport:LPT1=/dev/lp0': enable parallel redirection of /dev/lp0 to LPT1
             or      LPT1=/dev/lp0,LPT2=/dev/lp1
         '-r printer:mydeskjet': enable printer redirection
             or      mydeskjet="HP LaserJet IIIP" to enter server driver as well
         '-r sound:[local|off|remote]': enable sound redirection
                     remote would leave sound on server
   -0: attach to console
   -4: use RDP version 4
   -5: use RDP version 5 (default)

I haven’t reviewed the security implications but considering I’m only running in my own internal network, it’s not a major priority.

References

Improving Open Source Databases – WordPress

As part of both my UltimateLAMP side project, and my greater involvement with MySQL Workbench, I’ve been wanting to review and document the database schemas of Open Source products.

Indeed, as part of discussions with Mike Zinner of MySQL AB at the recent MySQL Users Conference, I suggested an idea used in the Java World, for example by Clover and Agitar, where to promote the usefullness of their respective tools (in this case code coverage), they provide results against Open Source Products. For an example check out Clover Examples.

With MySQL Workbench, to get some greater exposure of the use of graphical Data Modelling tool, I mentioned the idea of providing a respository of schemas, may help in exposure, usage and feedback of MySQL Workbench. Indeed my work is towards this being added to MySQL Forge, however I’ve found that at my first indepth work, using WordPress 2.0.2 has been a less then successful experience.

The schema would rate a 2 out of 5 on my scale of optimal database schemas. Here are some of my initial findings.

  • Inconsistent naming standards
    • ID uppercase for primary keys in some tables, but not in others.
    • different standards for primary keys, some id, some [table]_id
    • Inconsistent identification of foreign keys (e.g. post_parent). All keys should have _id.
    • Inconsistent datatypes for optional foreign keys (e.g. default of 0 rather then NULL)
    • Inconsistent column naming standards, some tables prefix columns with table name or table like name, (e.g. post_), but not all columns. Some tables has multiple prefix values (e.g. cat_, category_)
  • Poor usage of datatypes. Examples include BIGINT(20) for primary keys, TEXT for post title, INT(10) for columns which as values a TINYINT can support.
  • Inconsistent use of enums (e.g. Y/N for some values, and yes/no for others)
  • No Referential Integrity support (ie. Foreign Keys), limiting the benefits of a modelling tool showing table relationships
  • Different rules for primary keys (most tables have a suggorate key, but not all)

I am actually working on A MySQL Guide to Database Modelling, Design and Support, which covers a lot of these points and more. I’m some way from completing this document, but this initial research of the WordPress product highlights it’s importance to the open source community.

Of course we can just make noise about what we don’t like, or we can do something about it. I’m submitting my ideas, suggestions and code changes to the project. What is done about it is outside of my control, but I’ve at least given my input to this Open Source Product.

WordPress 2.0.2 Schema SQL (taken from a mysqldump after installation)

Here is a graphical representation of the Data Model from MySQL Workbench.
NOTE: My MySQL Workbench will not enable me to do a Export Image, so I’ll need to add this at a later time

Introducing Referential Integrity

Unfortunately this is no automated means of creating appropiate Referential Integrity for a schema. With more strict naming standards, it would be possible to reconstuct this information generally.

The approach to create these new rules were performed manually, but a tool could be developed to provide approximations for a number of these.

# Implement Foreign Keys. Changes include:
#  - Appropiate Storage Engine
#  - ensure column is of same definition as matching primary key (e.g. datatype + unsigned)
#  - create index for foreign key
#  - ensure optional keys use NULL

ALTER TABLE wp_categories ENGINE=InnoDB;
ALTER TABLE wp_comments ENGINE=InnoDB;
ALTER TABLE wp_linkcategories ENGINE=InnoDB;
ALTER TABLE wp_links ENGINE=InnoDB;
ALTER TABLE wp_options ENGINE=InnoDB;
ALTER TABLE wp_post2cat ENGINE=InnoDB;
ALTER TABLE wp_postmeta ENGINE=InnoDB;
ALTER TABLE wp_posts ENGINE=InnoDB;
ALTER TABLE wp_usermeta ENGINE=InnoDB;
ALTER TABLE wp_users ENGINE=InnoDB;

ALTER TABLE wp_usermeta MODIFY user_id  bigint(20) UNSIGNED NOT NULL DEFAULT '0';
ALTER TABLE wp_usermeta ADD FOREIGN KEY usermeta_users_fk (user_id) REFERENCES wp_users(ID) ON DELETE CASCADE;

ALTER TABLE wp_posts MODIFY post_author BIGINT(20) UNSIGNED NOT NULL;
CREATE INDEX post_author ON wp_posts(post_author);
ALTER TABLE wp_posts ADD FOREIGN KEY posts_users_fk (post_author) REFERENCES wp_users(ID) ON DELETE CASCADE;

ALTER TABLE wp_posts MODIFY post_parent BIGINT(20) UNSIGNED NULL DEFAULT NULL;
UPDATE wp_posts SET post_parent = NULL WHERE post_parent = 0;
CREATE INDEX post_parent ON wp_posts(post_parent);
ALTER TABLE wp_posts ADD FOREIGN KEY posts_posts_fk (post_parent) REFERENCES wp_posts(ID);

ALTER TABLE wp_postmeta MODIFY post_id  bigint(20) UNSIGNED NOT NULL DEFAULT '0';
ALTER TABLE wp_postmeta ADD FOREIGN KEY postmeta_posts_fk (post_id) REFERENCES wp_posts(ID) ON DELETE CASCADE;

ALTER TABLE wp_categories MODIFY cat_ID BIGINT(20) UNSIGNED NULL DEFAULT NULL AUTO_INCREMENT;
ALTER TABLE wp_categories MODIFY category_parent BIGINT(20) UNSIGNED NULL DEFAULT NULL;
UPDATE wp_categories SET category_parent = NULL WHERE category_parent = 0;
ALTER TABLE wp_categories ADD FOREIGN KEY categories_categories_fk (category_parent) REFERENCES wp_categories(cat_ID);

ALTER TABLE wp_post2cat MODIFY rel_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_post2cat MODIFY post_id BIGINT(20) UNSIGNED NOT NULL;
ALTER TABLE wp_post2cat MODIFY category_id BIGINT(20) UNSIGNED NOT NULL;
ALTER TABLE wp_post2cat ADD FOREIGN KEY post2cat_posts_fk (post_id) REFERENCES wp_posts(ID) ON DELETE CASCADE;
ALTER TABLE wp_post2cat ADD FOREIGN KEY post2cat_categories_fk (category_id) REFERENCES wp_categories(cat_ID) ON DELETE CASCADE;
ALTER TABLE wp_comments MODIFY user_id BIGINT(20) UNSIGNED NULL DEFAULT NULL;
UPDATE wp_comments SET user_id = NULL WHERE user_id = 0;
CREATE INDEX user_id ON wp_comments(user_id);
ALTER TABLE wp_comments ADD FOREIGN KEY comments_users_fk (user_id) REFERENCES wp_users(id) ON DELETE CASCADE;
ALTER TABLE wp_comments MODIFY comment_post_ID BIGINT(20) UNSIGNED NULL DEFAULT NULL;
ALTER TABLE wp_comments ADD FOREIGN KEY comments_posts_fk (comment_post_ID) REFERENCES wp_posts(ID) ON DELETE CASCADE;

ALTER TABLE wp_comments MODIFY comment_parent BIGINT(20) UNSIGNED NULL DEFAULT NULL;
UPDATE wp_comments SET comment_parent = NULL WHERE comment_parent = 0;
CREATE INDEX comment_parent ON wp_comments(comment_parent);
ALTER TABLE wp_comments ADD FOREIGN KEY comments_comments_fk (comment_parent) REFERENCES wp_comments(comment_id);

ALTER TABLE wp_linkcategories MODIFY cat_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_links MODIFY link_category BIGINT(20) UNSIGNED NOT NULL;
ALTER TABLE wp_links ADD FOREIGN KEY links_category_fk (link_category) REFERENCES wp_linkcategories(cat_id);

While there are no column name changes, by default the WordPress code should operate with this revised schema. Issues would include a poor delete strategy that still voilates the liberal constraints now enforced. Special handling of 0 for optional columns when the value is now NULL may also be a problem.

Revised Data Model in MySQL Workbench.
NOTE: My MySQL Workbench will not enable me to do a Export Image, so I’ll need to add this at a later time

Introducing improved Standards

Of course, I’d like to see the schema improve, here are my first draft ideas for improvements of the schema.

# Naming Improvements ===================================================================================
# Rename to lowercase id
ALTER TABLE wp_users CHANGE ID id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_posts CHANGE ID id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_categories CHANGE cat_ID cat_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_comments CHANGE comment_ID comment_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_comments CHANGE comment_post_ID comment_post_id BIGINT(20) UNSIGNED NOT NULL;

# Include _id for all Foreign Keys
ALTER TABLE wp_posts CHANGE post_author post_author_id BIGINT(20) UNSIGNED NOT NULL;
ALTER TABLE wp_posts CHANGE post_parent post_parent_id BIGINT(20) UNSIGNED NOT NULL;
ALTER TABLE wp_categories CHANGE category_parent category_parent_id BIGINT(20) UNSIGNED NOT NULL;
ALTER TABLE wp_comments CHANGE comment_parent comment_parent_id BIGINT(20) UNSIGNED NULL DEFAULT NULL;
ALTER TABLE wp_links CHANGE link_category link_category_id BIGINT(20) UNSIGNED NULL DEFAULT NULL;

# Primary Key Name Standardisation
ALTER TABLE wp_posts CHANGE id post_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_users CHANGE id user_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;

# Foreign Key Standardisation
ALTER TABLE wp_categories MODIFY category_parent_id parent_id BIGINT(20) UNSIGNED NOT NULL;
ALTER TABLE wp_comments CHANGE comment_parent_id parent_id BIGINT(20) UNSIGNED NULL DEFAULT NULL;

# Other Column Standaisations
ALTER TABLE wp_categories MODIFY cat_name category_name VARCHAR(55) NOT NULL;

# Column width improvements
ALTER TABLE wp_categories MODIFY category_count MEDIUMINT(5) NOT NULL DEFAULT '0';
ALTER TABLE wp_posts MODIFY post_title VARCHAR(255) NOT NULL;
ALTER TABLE wp_linkcategories MODIFY cat_name VARCHAR(255) NOT NULL;
ALTER TABLE wp_linkcategories MODIFY cat_id TINYINT(3) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_options MODIFY autoload ENUM('Y','N') NOT NULL DEFAULT 'Y';

# Obvious Index Improvements
# Make login a unique index
DROP INDEX user_login_key ON wp_users;
CREATE UNIQUE INDEX  user_login_key  ON wp_users(user_login);

These will require code changes and adequate testing, something I’m not hopeful would happen unless I drove it, but you never know, the next time I review the WordPress Schema I might be impressed.

Conclusion

This is just a broadstroke first review of the schema. As part of all good design, testing of the code, further review and refinement based on better understanding of the product, and also peer review are all important steps not yet undertaken.

Google Talk

Google has like many before it such as AOL, Yahoo and Skype created it’s own online chat program, which you can run with a windows program and which is very neatly integrated into Gmail.com. However I don’t use windows, I use Linux and by using the Jabber protocol Google Talk is designed to be compatible with other talk clients.

I just could not get the sucker to operate within Gaim successfully. Seems I was missing an advanced setting as found at http://www.google.com/support/talk/bin/answer.py?answer=24073 Now I have a successful login, I just need some other friends to join Google Talk

Seaching Google in Klingon or Elmer Fudd

Sometimes I wonder how I get to trivial information. Well in the Wikipedia article Google Hoaxes you will find that the Google search engine is valid in a number of unusual languages including Klingon. I’m a Star Trek fan, but that’s going a little to far, however I guess if they create Elvish Language the LOTR people would love it. My favourites are Elmer Fudd and Bork Bork Bork. For those that don’t know this line, it’s famous from the Chef in the Muppets.

 

There are plenty more, I noticed the added Hacker recently. What’s really funny is select a language, then go back to the languages page, and see if you understand the names of the other languages. 

My original Blog Post Some light hearted Google Fun