Data Modelling

I’m a data modeller. I specialise in this, and for a number of years on large projects I’ve been able to focus on this single task within the System Development Life Cycle of software development for several months at a time. Unfortunately what depresses me the most, is I can’t get a full time position in what I’m an expert in. It’s not a specialised skill that an organisation can use on a full-time basis, unless it’s a large organisation, and quite frankly, Brisbane isn’t a market that can support the diversity of large organisations. (caveat, large organisations that are proactive in software development, not just large organisations that have significant IT requirements, but do not work proactively). This is why I can also do Software Development, Database Administration, and even System Administration. Again, I’m not good enough to fill one of these positions in a larger organisation as an expert, but I can generally hold my own, usually even with surpising results. (Side note, even this week, I was providing a possible solution and tool for system adminstration across a large organisation, and it was 5 mins work. Something the paid full-time system administrators were not providing????)

I only started looking at Domas Mituzas wordpress: friendly look at query log. I didn’t have to read far to see where it was going, and well I quite quickly turned off, sorry Domas, I’m sure your concluding points were valid. This is my point, and it has been echoed in our local MySQL users group as well, the lack of appropiate database design in open source projects. There are several contributors, but one I put down to the “Hobbist and the Professional Syndrome”. A topic for further discussion, but in summary here are the bullet points from a slide in a presentation I prepared.


  • Downloadable software and examples
  • Online tutorials
  • Books like Learn in 24 hours/For Dummies


  • Formal Qualifications
  • Grounding in sound programming practices
  • Understanding of SDLC principles
  • Worked in team environment

Middle Ground Developer

  • Time to skill verses output productivity
  • Depends on environment and requirements

And on a final note, I guess why doing some raving. I find it criminal that organisations encourage at times a level of incompetence by promoting people that develop bad code into positions where they can continue to ensure that bad code stays, and further business decisions only engrain an organisation down the continued wrong path. There is already enough poor software developers out there that give the industry a bad name, but the good ones are few and far between.

What do we do, how can we solve this problem? I don’t think it can be solved now in the Open Source community. Adopting an Agile Development methodology such as Extreme Programming (XP) for example, it a very good start in organisations, something I’ve been working with now, or working with the principles for 6 years.

PS: Modelling is actually spelt both Modeling and Modelling (2 l’s) across the various English derivitives. Just incase somebody wanted to make comment.

Contributing to JMeter

As part of my using JMeter for the purpose of testing a new Transactional storage engine PBXT for MySQL, I’ve been investigating the best approach for handling transactions. Read more about earlier decisions at my earlier post Testing a new MySQL Transactional Storage Engine.

I found that the JMeter JDBC Sampler only supports SELECT and UPDATE Statements, and not calls to stored procedures. This is just one approach I’m considering taking.

Well, I guess it’s time to contribute code to an Apache Project. I’ve modified code and logged bugs before for Tomcat, but this will be my first attempt of modify code and submit.

A summary of what I did (really for my own short term memory):

Now I just have to wait to see if it’s accepted. Regardless, it works for me. And that’s Open Source. FREEDOM

svn checkout jmeter

$ svn diff >
$ cat
---    (revision 388876)
+++    (working copy)
@@ -23,6 +23,7 @@
 import java.sql.ResultSetMetaData;
 import java.sql.SQLException;
 import java.sql.Statement;
+import java.sql.CallableStatement;

 import org.apache.avalon.excalibur.datasource.DataSourceComponent;
 import org.apache.jmeter.samplers.Entry;
@@ -45,6 +46,8 @@

        public static final String QUERY = "query";
        public static final String SELECT = "Select Statement";
+       public static final String UPDATE = "Update Statement";
+       public static final String STATEMENT = "Call Statement";

        public String query = "";

@@ -69,6 +72,7 @@
                log.debug("DataSourceComponent: " + pool);
                Connection conn = null;
                Statement stmt = null;
+               CallableStatement cs = null;

                try {

@@ -88,14 +92,19 @@
                                        Data data = getDataFromResultSet(rs);
                                } finally {
-                                       if (rs != null) {
-                                               try {
-                                                       rs.close();
-                                               } catch (SQLException exc) {
-                                                       log.warn("Error closing ResultSet", exc);
-                                               }
-                                       }
+                                       close(rs);
+                       // execute stored procedure
+                       } else if (STATEMENT.equals(getQueryType())) {
+                               try {
+                                       cs = conn.prepareCall(getQuery());
+                                       cs.execute();
+                                       String results = "Executed";
+                                       res.setResponseData(results.getBytes());
+                               } finally {
+                                       close(cs);
+                               }
+                       // Insert/Update/Delete statement
                        } else {
                                int updateCount = stmt.getUpdateCount();
@@ -112,20 +121,8 @@
                } finally {
-                       if (stmt != null) {
-                               try {
-                                       stmt.close();
-                               } catch (SQLException ex) {
-                                       log.warn("Error closing statement", ex);
-                               }
-                       }
-                       if (conn != null) {
-                               try {
-                                       conn.close();
-                               } catch (SQLException ex) {
-                                       log.warn("Error closing connection", ex);
-                               }
-                       }
+                       close(stmt);
+                       close(conn);

@@ -164,6 +161,38 @@
                return data;

+       public static void close(Connection c) {
+               try {
+                       if (c != null) c.close();
+               } catch (SQLException e) {
+                       log.warn("Error closing Connection", e);
+               }
+       }
+       public static void close(Statement s) {
+               try {
+                       if (s != null) s.close();
+               } catch (SQLException e) {
+                       log.warn("Error closing Statement", e);
+               }
+       }
+       public static void close(CallableStatement cs) {
+               try {
+                       if (cs != null) cs.close();
+               } catch (SQLException e) {
+                       log.warn("Error closing CallableStatement", e);
+               }
+       }
+       public static void close(ResultSet rs) {
+               try {
+                       if (rs != null) rs.close();
+               } catch (SQLException e) {
+                       log.warn("Error closing ResultSet", e);
+               }
+       }
        public String getQuery() {
                return query;

$ svn diff >
$ cat
---    (revision 388876)
+++    (working copy)
@@ -50,7 +50,7 @@
                p.setValue(NOT_UNDEFINED, Boolean.TRUE);
                p.setValue(DEFAULT, JDBCSampler.SELECT);
-               p.setValue(TAGS,new String[]{JDBCSampler.SELECT,"Update Statement"});
+               p.setValue(TAGS,new String[]{JDBCSampler.SELECT,JDBCSampler.UPDATE,JDBCSampler.STATEMENT});

                p = property("query");
                p.setValue(NOT_UNDEFINED, Boolean.TRUE);


Good to know somebody read my post, and responded positively. The quickest way for patches is to log a Bugzilla request. Seemed somebody already had, so it was easy for me to just to contribute to Bug #38682

Testing a new MySQL Transactional Storage Engine

As part of my A call to arms! post about a month ago, I’ve had a number of unofficial comments of support. In addition, I’ve also been approached to assist in the completion of a MySQL Transactional support engine. More information on the PBXT engine will be forthcoming soon by it’s creator.

Anyway, I’ve taken on the responsiblity of assisting in testing this new storage engine. This will also give me the excuse of being able to pursue some other ideas about the performance of differing storage engines for differing tables in business circumstances, such as MyIsam verses InnoDB in a highly OLTP environment. Part of testing will be ensure ACID conformance in varying situations and multi-concurrency use. Of course the ability to also do performance and load testing would be a obvious extension.

Considering how I’m going to benchmark is an interesting approach. I of course want to use Java, my choice of language at present. This presents a problem, in another factor towards performance, however by using Java, I’m simulating a more real world environment of a programming overhead and JDBC Connector rather then just raw performance output.

Laying out a plan would include an ability to have an existing database structure and data, be able to bulk define SQL statements and transactions, and parameterise SQL during transactions. I would need to be able to verify the state of database from the transactions, and clearly identify any invalid data. I would also need the ability for handling threads, and of course adequate reporting of my results.

As of MySQL 5.1.4, there is a supported benchmarking tool called mysqlslap in MySQL. I’ve discounted using this because I figured at this early stage, the documentation and exposure of this is of course limiting, and I’m sure I’d still need to perform other development.

Along comes JMeter. Within Java development I use JUnit quite extensively. This is key in the test-driven agile methodology approach of Extreme Programming. In discussion with this problem with a collegue on a new project, I found that JMeter was used for extensive load testing for web applications, but also performed database testing, and provides the support to integrate JUnit tests.

So yesterday I had a quick look at JMeter. The capabilities for defining, reporting and threading are quite complete. It took litterally minutes to install, configure, run an initial test and view results all in a GUI interface. A little more work gave me scripting handling of my initial tests. I’ve posted my initial investigations of JMeter – Performance Testing Software and JMeter and Ant Integration earlier.

With this behind me, I’ve just got to define the approach for more complete transactional tests, explictly confirming the results (I’m hoping to achieve this in custom JUnit tests). If I can solve this, then I can spend the most of time in the defining of adequate tests. Let’s see what the next few days work provides.

JMeter and Ant Integration

Using Ant withJMeter you can achieve remote running and web based reporting.

I got the ant-jmeter.jar and sample results output .xls from Embedding JMeter with Ant. JMeter Ant Task

cd /tmp
mv ant-meter.jar $ANT_HOME/lib

Within a new project directory, place your saved JMeter Tests (*.jmx) in a loadtests subdirectory, and the downloaded jmeter-results-report.xsl in the project directory.


<project name="dbtest" default="dist" basedir=".">

<property name="base.dir" value="."/>
<property name="report.dir" value="report"/>

<taskdef name="jmeter"
<target name="dist" depends="runtest,testresults" />

<target name="runtest" description="Run jmeter tests">
        <jmeter jmeterhome="/opt/jmeter"
                <testplans dir="${base.dir}/loadtests" includes="*.jmx"/>

<target name="testresults" description="Report Test Results" depends="runtest">
        <delete dir="${report.dir}" quiet="true"/>
        <mkdir dir="${report.dir}" />
        <xslt in="${base.dir}/loadtests/JMeterResults.jtl"

Report output from running ant can be found at report/JMeterResults.html

JMeter – Performance Testing Software

Apache JMeter is a 100% pure Java desktop application designed to load test functional behavior and measure performance. It was originally designed for testing Web Applications but has since expanded to other test functions. Specifically it provides complete support for database testing via JDBC.

Some References: Homepage  ·  Wiki Page  ·  User Manual

Initial Installation Steps

$ su -
$ cd /opt
$ wget
$ wget
$ tar xvfz jakarta-jmeter-2.1.1.tgz
$ tar xvfz jakarta-jmeter-2.1.1_src.tgz
$ ln -s jakarta-jmeter-2.1.1 jmeter
$ echo "PATH=/opt/jmeter/bin:$PATH;export PATH" > /etc/profile.d/
$ . /etc/profile.d/
$ jmeter &

Adding MySQL Support

cd /tmp
tar xvfz mysql-connector-java-3.1.12.tar.gz
cp mysql-connector-java-3.1.12/mysql-connector-java-3.1.12-bin.jar /opt/jakarta-jmeter-2.1.1/lib/

Steps to perform simple MySQL JDBC Test.

1. Launch JMeter
2. Add a new Thread Group (using right click)
3. Define Thread Settings (no messing around 3 threads x 1000 iterations)
4. Add a Sampler JDBC Request
5. Add initial sample SQL query
6. Add a JDBC Connection Configuration
7. Define JDBC Connnection details (I’m using the sakila sample database at this time)
8. Define a Results View
9. Run the sucker

This is just a quick intro to prove it all works, There are quite a lot of reporting output possible, something for my next post.
Click on Image for a larger view.

MySQL Sakila Sample Application

I’m sure you are all aware by now of Mike Hillyer’s MySQL Sakila Sample Database that will be launched at the MySQL Conference. We now have an official MySQL Forum for this as well.

As part of leveraging this existing database, and using this for the basis of my MySQL Conference presentation on MySQL for Oracle Developers, I’ve released the first version of my MySQL Sakila Sample Application at which I would very much like some feedback on. Please use the official MySQL Forum for any comments, suggestions or complaints. Please Note I am still very much in the planning and design phase.

We also have an Unofficial Wiki that describes a little more of the concept and purpose the Sample Application, and a call for others to get onboard to design and develop their own versions in varying languages.

So the sample application, what does this showcase? For now, work has been on the presentation of data, as we finalise the schema and data. In addition the application has been designed to be more self documenting describing via the top menu options the functions available, and business logic considerations, specific MySQL features and a schema to show the underlying tables in question. Look at Admin or Film for an initial example.

Here is a quick list of the functions of the MySQL Sakila Sample Application. For now there is not user authentication so it’s open for all to view.

  • Home
  • Customers
    • Index (NF), Search/List, Add, View (NF)
  • Rentals
    • Index(NF), New, Search (NF), Return(NF), Overdue(NF), Out(NF)
  • Film
    • Index, Movie List, Actor List, Categories, Languages
  • Reports
    • Index (NF), Top Film Rentals, Top Customer Rentals
  • Admin
    • Index, Staff, Stores, Countries, Cities

(NF) – Not Functional – Please note, as the data hasn’t been finalised, some of the data is my own patch just to display functionality.
More information on what’s available is in the Admin|Documentation page.

XP January Meeting

The Brisbane XP Group met yesterday for a presentation by Dr Paul King of Asert on the book Sustainable Software Development : An Agile Perspective.

I found it a good time to get a collective opinion and review of the techniques and methods we are moving towards in Software Development. Indeed one key point better describing Pair Programming has been added to my upcoming conference presentation Overcoming the Challenges of Establishing Service and Support Channels. I’m hoping Paul makes his notes available as a review of this book, that I will also mention in my presentation.

In Review, this is some of the key points I got from this presentation.

  • Software gradually degrades over time, and will become a maintenance nightmare
  • Successful software will be changed again and again
  • The IT industry has a problem historically with credibility

So the goal is to move towards Sustainability. Some of the points mentioned by Paul were:

  • Continual refinement
  • A working product at all times (not just working software)
  • Value Defect Prevention over Defect Detection
  • Additional investment and emphasis on design

On point I struggle with is Pair Programming. I don’t struggle with the concept, it’s great and really works. The struggle is selling Pair Programming as a core XP Principle. Some good points of discussion lead to a better angle.

  • Pair Programming – should be de-emphasised as a key point
  • By selling Defect Prevention and using Continuous Code Reviews as one method of implementing this
  • Continuous Code Review are achieved with Pair Programming

Much easier. The key point is management understands the term Code Reviews, and if you can show the effect of Defect Prevention on support costs, using Pair Programming, Refactoring and other techniques, your sales pitch will be easier.

Also for reference, the book Software Craftsmanship: The New Imperative was mentioned as a book with similar ideals. A third recommended reading book that was mentioned at the meeting was The Pragmatic Programmer: From Journeyman to Master.

Support for Technology Stacks

As part of my next conference presentation Overcoming the Challenges of Establishing Service and Support Channels I’ve been struggling to find with my professional sources, any quality organisations that provide full support for a technology stack, for example a LAMP stack, or a Java Servlet stack.

Restricted to searching via online, I’ve been impressed by what I’ve found at Spike Source An organisation with an experienced CEO, well known in the Java Industry. They certainly have all the buzz words covered in their product information.

Benefits of their SpikeSource Core Stack.

  • Fully tested and certified
  • Installs in minutes with integrated installer
  • Enterprise-class maintenance and support available
  • Vendor neutral
  • Horizontally and vertically scalable

SpikeSource offers three prebuilt configurations that can have you up and running in around ten minutes. These configurations comprise the following component choices:

  • LAMP Stack – for Websites with dynamic database-driven content written using Perl and PHP.
  • Servlet Stack – for dynamic Websites written using Java-based Web technologies such as servlets.
  • J2EE Stack – for Web applications that separate Web interface and application logic using Java Servlets and Enterprise JavaBeans.

Supported Platforms. What’s of interest here is RHEL, SuSE as well as Fedora Core 3. In line with for example Oracle software running under Linux.

What’s interesting, is they have MySQL 4.1.14 in their spikesource stack (1.6.2), so they are quite some months behind here. Especially now that MySQL 5 has been available 3 months now. Not only just stack technology, their infrastructure supports a large number of open source products and appears to provide infrastructure via a community to enhance the product offerings within this stack. The Spike Developer Zone Components List provides a long list of products.

Their release notes provide good instructions, in particular what configuration was used in the building of the software. For example, here is the MySQL Release Notes, MySQL Quick Start Guide, MySQL Troubleshooting Guide

They talk about testing, where Core Stack Testing provides more details here.

They also claim to provide VMWare Community Virtual Machine that can be run via the free VM Player on any system without having an effect on an existing system. This is indeed impressive, however it doesn’t seem available. There are many other installations available at the VMWare site.

I’m interested to see what else existing in the marketplace for a fully supported technology stack, rather then support of individual components (e.g. RedHat for Linux, MySQL AB for MySQL, JBoss for a servlet container)

In reading comparisions, there is reference also to Source Labs – Anybody that can offer recommendations that I can research would be great.

Book Review – Beyond Java

Well the title got me when I decided to purchase this book “Beyond Java – A glimpse at the Future of Programming Languages”, however perhaps it should have been titled “Why to move from Java to Ruby” as the book for a good portion is an explanation of how Ruby solves the problems that Java has and the direction Java is moving. While the book did describe where Java was, and the future limits and what to look at Beyond Java, the high use of Ruby to describe these overwhelmed the book. In fact, only the last chapter of 20 pages gave an comparison of “Contenders” as the chapter title described other then scant descriptions

Initially I lost count of the number of times information regarding C++ was repeated in the book, and how Java got it’s great penetration from the C++ community. I almost put the book down after the first few chapters, it was highly repetitive.

However, given my increasing interest in Ruby I was able to work though this. I could see a Java developer that has already discard Ruby as a fad to put this book down. In fact, as a Ruby reference it provided some good tips, again strengthening my comment of including Ruby in the title.

Overall an interesting read, however for a small book it could have offered a lot more.

On the same topic, some interesting points in the article The Problems With Java.

Unit Testing A Database

In a recent job interview I was asked the question regarding Unit Testing/Automated Testing of a Database? An interesting question and indeed an interesting problem. I thought it was a good topic to describe what I’ve done in the past, and where I would go for a more complete testing environment given the opportunity of a entire XP project.

This is the approach I have implemented successfully in the past. It’s not a complete solution, however at the time with the client it provided appropriate coverage.

I don’t use a framework such as dbUnit to load data via XML, or specifically test data. XML is ugly to store data, and also with maintenance and comparison. I start with a pre-configured database of representative sample data, refer to my notes later on this, and then I use the tests of the application to perform the necessary data manipulation. This ensures that you are as close as possible to testing actual situations, and ensuring that any issues the application does (such as enter bad data, or RI failures) are caught appropriately.

Within this process an automated build test would first reset the database to a known set of data. I’ve also found that this helps as you can also recreate the schema if necessary. As part of Schema Design in an XP Development, I have two ways to create the database schema. You can create it from scratch (so there is always appropriate SQL to create the current version of the Schema, lets say BUILD_102. Alternatively, you can always upgrade between releases, for example between BUILD_101 and BUILD_102 with the appropriate upgrade scripts. Upgrade or patch scripts only move one version to the next version. It’s not possible in a production environment to simply recreate your schema for each release, however for testing and training you can. It’s also pointless after 50 releases to have to perform 50 patch releases from the original source schema for every automated build test.

This does lead to two paths necessary for creating a schema, but this can also be tested adequately in an automated way.

I also split my application tests suites that use the sample data into two buckets. Destructive and Non-Destructive. The reason is the Non-Destructive tests (i.e., non DML statements) can be re-run as many times as necessary. The Destructive tests (i.e., DML statements) can only be run once, before the database must be restored. Of course you can have the approach of setUp() and tearDown() within JUnit however it’s cleaner if you can extract this somewhat to a higher level, making the Unit Tests easier. By also running tests that don’t continually use the some data, or builds the data though test execution, you get a better coverage of different data sets. To give a few examples, You could create a Test that created a row of data, then edited the same row, then deleted the row. These are indeed valid, but if the first test fails, how do you know if the update and delete tests are also broken, they are by dependant by default and will fail, but did they really. If with your sample data, you created a new row, edited a different pre-configured row, and also deleted a different pre-configured row, you could eliminate the need to dependencies.

Now of course, there are situations where data must be specifically checked at the database level, for example it may never be displayed in your application, it may be intermediate information that is then summarised for display, or internal audit information held against data (for example Create User, Last Updated User), or data created by procedures or triggers. There are also situations when even within an application testing that can verify the data in a User Interface, you want to verify this at the source.

To this end I have a custom written JUnit extension that can perform specific SQL statements and comparison. I’ll need to write about this and provide this at a later time. (when I can dig it up)

Sample Data

Sample Data in the database is pre-configured, not in XML files, but so it can be managed by more primitive means, either by a database GUI interface or via SQL flat files or even text files. Why have pre-configured data this way? A few reasons.

  • It’s not coupled to your tests in any way, so it can be reused, for example as Training Data.
  • You can use database specific tools more easily say in loading the data in a relational way.
  • You can use the same database specific tools to export the data easily, if say you use an application to modify certain information.
  • You could more easily incorporate legacy data that is also being migrated if you use the same database specific tools.

Granted XML is universal in it’s data representation, it’s more self descriptive, but it’s a really pain to edit manually, and it’s very verbose when there are simply more primitive methods of this type of data management.

So we are creating a pre-configured data set, and an extensive one when possible. As I mentioned, the re-use capabilities for training or demonstrations really works.

Training Data

I have successfully with a number of systems, specifically CRM implementations used a Cartoon Environment for the sample data. There are a few reasons for this. First, most people I’ve ever met can related in some way to some set of the data. If they can’t, then there can read info online, or watch a movie etc, and get an appreciation from the representative data set, effectively I’m leveraging of the time and effort of others here, much better then a non-descript set of data.

You have the cartoon characters (e.g. Mickey Mouse, Donald Duck, Daffy Duck, Marvin the Martian, The Simpsons, The Flintstones, The Jetsons), use all the streets and rides as Disneyland as addresses, the animators as the users (e.g. Walt Disney, Chuck Jones, Stan Lee), you can use the different studios (e.g. Warner Bros, Disney, Pixar) for different states or countries, you can use shows or movies (e.g. Toy Story, Shrek) to group characters in other ways.

With this type of data, common attributes such as birth date, family units, nick names, people deceased etc, are all part of the available data. It’s surprising how much information you can find when using The Simpsons for example, of full names, addresses, interests etc.

It’s impressive when the CEO of a company is showing the application to overseas business partners, when his knowledge of the application (from his management perspective) is sufficient, there is no knowledge of the data really necessary to use or explain as it’s commonly used and generally understood.

At this point I would like to ensure that I correctly acknowledge the registered trademarks of Disney, Warner Bros, Hanna Barbera, Pixax, Dreamworks and that I am not using the names for any profit.


So this is what I’ve used in the past. What would I do in the future if I was charged with bullet proof testing of a database, even independent of an application, effectively 100% test coverage of the data. Well, this is an unproven approach, but I’d relish the opportunity to give it a full blooded test one day.

How to test the database with an automated test approach.

I’d consider the breaking down of testing into 3 areas. These being:

  1. Schema
  2. Data
  3. Business Logic/Referential Integrity

Each of this is effectively built on the the preceding points.


This would be quite straight forward it’s a flat comparison between schema’s, which could be managed via the appropriate products data dictionary tables using SQL. You could even simply compare 2 schema’s in a few simple SQL statements. You could also use the approach of export the schema definition, and then compare flat files. You will find some downsides to these approaches, ordering is a big thing, columns within a database table, or the order of the tables that are exported may not be guaranteed. However given appropriate standards are defined used of tools this comparison could easily occur.

Being able to verify patches between releases, and full installed schema’s are also possible. The schema is the easy part.


Data could be tested in varying means. Counts, sums and sample comparisons, but it’s also just data, why not md5sum the entire data. Why not even dump the data to flat files, and use basic difference tools for comparison. One simple approach. Especially if you are loading data, using or manipulating it, then you can export and compare at a file level. This would work very well for data considered Read Only for the life of testing.
This format may allow you to compare data between two different database products, e.g. Oracle use for your transactional online processing, and MySQL use for Web Data or Management Summary Reporting application.

In order to test the data you need the schema, but how can you test the data without the business logic and Referential Integrity. Within MySQL you can easily disable foreign key constraints, or easily adjust the table type to a structure to ignore this syntax. This could allow you to run tests with and without Referential Integrity to determine the strength of your application.

Of course this is a static version of the data. Performing separate testing of DML statements directly against the database could prove a waste of time, unless your application was written in such a way, that your application database layer was a complete API. Still you would be simulating what your application is ultimately doing, so it could be overkill. You could apply the techniques of comparison with know results after a successful running of automated build testing.

Business Logic/Referential Integrity

The hard stuff. Well you could be half way there with adequate schema and data testing. At least you are then confident that core integrity exists.

The problem is also to do with application integrity verses database integrity.

Let’s take a percentage, it goes from 0 to 100. Now using MySQL for example, you would define this as TINYINT(3) UNSIGNED, giving you a valid range of 0-255, and by default a display characteristic of 3 characters (the (3) in this example is just beautification).

You application logic restricts the value into this column to 0 to 100. But do you enforce this at the database? Depends on your needs. If the application is the only way to insert and maintain data, then you could get away with it, if data can be managed from other external systems, you may have APIs that also need to manage it. What if you grant SQL access to DBA’s, could they accidentially mess it up. That song “It’s a fine line between pleasure and pain” comes to mind at this moment. I guess what I’m getting at here, in solely database testing you could easily insert 255 into a percent column and pass a number of data specific tests. I’d assume it would fail some as well otherwise your tests aren’t complete, but when using the application you could never test 255, as the client would never allow it.

There are a lot more issues in RI testing, Cascading UPDATE, DELETE rules for etc. And then when you work all that out, you have to start with Triggers and Stored procedures. I’m not going to spend any time here at this time.

Unforeseen Side Effects

Data is a strange beast. It’s the source of information, so I always like to go back to the data for comparison, however the lack of good data (most notably Legacy Systems Migration) can drive you mad. What good is it to have a new system but not be able to enforce an adequate level of Referential Integrity or Business logic due to incomplete historical data. In essence this has proven in the time I’ve also supported large systems, a good portion of the development cost in support, it’s bad data and/or the need for a simple application to have more complex rules to cater for so called incomplete data.

I’ll give you a trivial example. Gender. In the new system an organisation will always ensure they get the gender of a customer (let’s not wonder how they do it, it’s just an example). So the application is designed to support Male/Female, Reference Data may exist to translate M & F to Male and Female respectively for data storage efficiency. Check constraints, enumeration data types (which I don’t like) may exist. Reports may do side by side comparisons.

Now the company buys a competitor, and then gets their database of 500,000 customers, but they don’t record the Gender. Do you then relax all your great integrity? Do you introduce a gender of Unknown? But that’s only for display, the maintenance screens can’t allow you to select it, so you then need a different level of Reference Data management. Do you make an educated guess and correct the data? Does the customer do an expensive mailout campaign and data collection process to correct this information? So what’s the big deal anyway the customer asks? Well if it’s an organisation that sells hygiene products, you don’t want to sending out material on Shaving Cream to Women on your list and Body Wax to Men. However you can’t use that explanation to describe solely a database driven reason to the customer for the cost of introducing this data. How do you show business value to the customer, when they simply what the data available?

I know this is a trivial example, but if I had a dollar for every trivial problem that customers spent months on,verses the really hard problems, I be writing this from a much more comfortable and relaxing resort haven. (A ski resort rather then a beach resort)


So can you Unit Test a Database solely without an application? Yes. Would you want to? Maybe, to a certain degree. Depending on your type of data. If all your information is highly visible in data entry and data retrieval, you should couple testing more closely with your application. If your data is very generated and collated, rarely user entered but bulk loaded, such as Sensis information or GIS information, then dedicated testing all aspects of the database decoupled from the application could indeed make your application test easier, because it’s easier to identify bad data that the application creates.

Database Modelling within an XP Methodology

In an eXtreme Programming (XP) Agile Methodology approach towards software development the absence of adequate database design, or the scant regard of it, with the assumption that a framework and persistence infrastructure will take care of that can be a disaster in a larger enterprise solution. In essence it’s a scaling effect. The smaller the system, normally the smaller the number of users, amount of functionality and volume of data does not show the inefficiencies in database design as they can be masked by acceptable performance. But scaling up the system, or designing a large enterprise system the effect will become multiplied quickly. Of course using solid XP practices, the ability to make changes and integrate will be easier, but the amount and complexity of changes may be significant.

A more pragmatic approach is necessary in Database Modelling and Design, especially in a larger enterprise solution when using XP or an Agile approach. Assuming that the choice of Relational Database/s has been chosen, greater care is necessary and advanced preparation and planning required. Purists could argue YAGNI, but ultimately the customer will be distraught if the system is perfect in functionality and user interface but can’t handle the performance a production load of users or gradual growth of users when all is well in testing, demonstrations and customer training. The other catch is the need for additional disk space added monthly due to unknown requirements.

Additional considerations such as legacy systems data migration, database sizing, database growth, performance requirements, number of users etc can’t conform to a traditional XP approach. These tasks require varying lead times, for example the purchase and configuration of hardware and software need to be augmented with the XP approach.

XP is not for every project, and in the number of instances I’ve been involved with, certain considerations due to the environment, customer and usually management are necessary to be adjusted or tailored for the specific project. I’m not stating that an XP approach can’t apply to large scale enterprise Database Modelling, more that some adjustments particularly within the Planning Process are needed to integrate a more balanced solution.

The database is the foundation, I draw an analogy when discussing with friends to building a house. If you don’t have the core foundation correct, the slab, the essential primary fittings of plumbing, power etc, and a floor plan of key, important and known things, you will forever when building your house be spending additional time and resources to prop this up, taking away time, money and energy from the significant part of building the house so it can be ultimately used. (Thought: I wonder how you would build a house XP style. What’s the most important part for the customer. Something to ponder one day).

Are foundations perfect? No. Do they change and adapt? Yes, They do. However the cost is significantly higher, and the investment in getting it right the first time is invaluable.

I can’t count the number of times I’ve come onto an existing project, and the Database Designer for lack of a better word is a novice. It makes me shutter. It’s an expertise I’ve specialised in, however I’ve had to broaden my skill set, as this task is not used throughout a traditional SDLC project of 12-24 months, and it really saddens me when simple 101 mistakes are made and the downstream impact is significant, and management don’t know or realise the impact. Even worse is when I tell them what’s needed for a correction, and they look at the impact, and the decision is that this fixed known cost to correct is worse then any projected unknown down stream costs of maintenance and future development.

Ultimately I’ve got my way on projects more easily when I’ve also focused on Application Performance Tuning in projects, where I look solely at the application needs, not just the hard core DBA or System Administration tuning. A DBA doesn’t really care about the application and end user impact, they care about the figures in the database. In this situation, low level structural changes and associated costs are weighted against application performance ultimately necessary for the end user. I remember one project that took the development/user/testing/release team 3 months of work (probably ~ 200 man weeks of work) to implement and deploy a key structural change that I had identified and proposed as part of longer performance analysis period of this system. Of course when you had 6 full-time DBA’s alone, 33 remote distributed systems and 1000’s of users, the resultant impact and the future improvements possible were worth the investment. The need not to upgrade the hardware alone was worth it. This project was also over 10 years ago and a lot of techniques have changed since then. Anyway, back to the point of the discussion.

Let me give you an example situation where you use a traditional XP approach to software development, and how to weave in a more structured database design and modelling approach.

User Stories

As part of the gathering of initial User Stories from the customer, the Database Designer should be reviewing these and beginning to build a high level Logical Data Model. The Database Designer is not needed in any initial interaction with the Customer, however early reviews may clearly indicate gaps in stories due to historical experience that can then be feed back to the customer for considerations. Initially it should just be on paper, or a whiteboard, as initially this should correspond to the high level comparison with the User Stories and also the fluid nature of changing user stories. However it should highlight immediately known key entities, key relationships between entities, key integrations to external systems and areas that involve early input of volume estimates of data, perhaps with comparison to existing legacy systems.

For example, a rates billing system I was involved in had ~400,000 clients billed quarterly (4 times a year), and each bill had on average 10 line items (just looking a small sample, and getting a figure from the legacy system). Now it doesn’t take much to consider the size of tables which would record billing history (400,000x4x10) for each cycle, or GL reconcilation over 5 years (400,000x4x10x2x5 that’s 160 Million). Disk storage alone without any hint of the average row length, or indexes can be guestimated when a number of key entities are identified. Of course it could be out by a factor of 2, 5, or 10 times at this early stage, but not a factor of 100 times.

In a situation like this with any entity, even at the first mud map, I would be recording a number of indicators when possible. These would be:

  • Initial number of rows
  • Annual growth in rows
  • Projected rows after ‘x’ years

In addition I’d also flag each entity roughly for performance considerations by access, and also volume of transactions. This helps in identification of key performance considerations which may impact the Database Design in areas such as indexes (i.e. Disk Space) and schema optimisations after normalisation.

  • OLTP
  • Batch

For example, the billing process that creates quarterly bills and inserts millions of rows is both a Batch Process and a frequency of once every three months, as apposed to new accounts which is OLTP, happens at a regular quantity per day every day, and during the day.

Now, in the mud map these figures don’t have to be accurate, in fact in the billing system example, if the table didn’t have 10,000 rows it didn’t rate a mention, and 10,000 to < 400,000 was considered small. The limit here being the volume of the most key entity. Simply because a number of tables will be a factor larger based on this key figure, and this will swamp any insignificant tables. Again, this is week 1 of a potential 52 week project, it can afford to be vague in areas.


How do you term this in a XP environment? Call it a Spike. The purpose of a Spike is to explore options and evaluate risk. While in general the practice would be to throw away the results, in this case these Spikes should be kept and built on progressively.

Release Planning

As part of the Customer’s User Stories being estimated and prioritised within Release Planning, more refined Logical Modelling and even Physical Modelling is necessary. Of course until the customer refines priorities based on estimates, the actual order of implementation can’t be confirmed.

It is possible however a number of stories within an iteration may not relate together within the Database Model. How do you address this? Regardless of the actual attributes of these logical entities that are not available, a physical model representation can be built on necessary tables and relationships to ensure practical use of functionality.

Should the need for missing relationships within a Database Model impact the priorities of User Stories? No. Estimates should accurately reflect if additional work is needed for any given User Story, hence the Database Designer is necessary in the estimation process.

Iteration Planning

As part of the Iteration Planning it is key that the physical Database Requirements are in place to enable Writing Unit Tests and Writing Code for each Development Story for the Iteration. This should occur at the start of the Iteration, but is not before.

Daily Stand Up Meeting

This is the best opportunity during the Daily Stand Up Meeting to advise of structure impacts, or developers to request more involvement by the Database Designer in their specific task.

No system is perfect, and there will be times that the Database Model does not reflect the requirements by a coder for a task. At this time the developer is responsible to adjust their instance of the Database model to complete the coding task. The issue is; does the coder have the ability to modify the database schema when checking in? No. The responsibility for the Database Model is for the Database Designer, a difference to the normal coding practice where any coder can modify any code. The next section describes this reason in detail. At this point there is an inconsistency within the repository, and any automated testing for a build will fail. And this is what should happen. From the developers perspective their instance of the Code Ownership is correct, but for the entire iteration it is not.

One could argue this is not valid however, the cycle of Test, Code, Refactor, CheckIn while applying to the individual task, should also apply to the Database Design, however it should be at an iteration level not at a task level. To have your continuous builds breaking will however raise a lot of red flags, and I’m sure this approach will ensure that the tests which are broken as a result, will be corrected by the schema definition (the Code), there will be some refactoring if for example the Developer didn’t serve in the best interests of the bigger picture, the schema is checked in, and the continuous build bar is now green. Yipee. The bar is green, the code is clean, we can all go home now.

The Database Designer

The Database Designer should be dedicated resource in the development team for this task. For example in a team to 6-10 developers you would have 1 or 2 people. Even in a team of 2 your would have 1 person responsible. Unlike coding where any of the team pair up and work on tasks within the Iteration, the Database designer team is responsible for Database Design, like the Development Team is responsible for Coding. There are a few reasons for this:

  • They are ultimately responsible for the structure, the foundation, and this is the bigger picture that includes visibility and scope of further iterations and releases.
  • The skills are more specific.
  • They should also be a part/time developer of the team, so as to best understand the dynamics and also be part of the team.

Correspondingly, the end of the iteration should include an addition code review, that is schema related. A difference report of the schema definition at the start and end of the iteration should be compared, to ensure best practices in the larger picture as well as standards, optimisations, future performance considerations (e.g. indexes), future disk requirements (e.g. adding a new index to a 160 million GL table will take 4GB of disk space).

Other Factors

In fact as part of my upcoming conference presentation Overcoming the Challenges of Establishing Service and Support Channels I spend some time discussing Data Quality. Quite often this is the bane of support due to the complexities of software development to cater for data exceptions, or most commonly data anomalies due to historical data not meeting minimum RI and data specifications in your new system.

Book Review (Part 1) – Better, Faster, Lighter Java

Well if the weight of the book has anything to do with it, it’s the lightest Java book I’ve got. Better, Faster, Lighter Java, which I got from Amazon, has been a quick read. I’ve done a quarter of the book (60 pages) in one bed-time reading. Some good information, I’ll provide a review when I’ve finished reading the book. What’s surprising that of the content that can be confirmed solely programming (i.e. the code), there were a number of errors in the book already. Here’s a summary of comments of what I’ve already sent to the publisher. (just showing the technical stuff)

Example 1-1. Counter example: implementation (pages 3-4)

Point 1:
Book: Mid page (page 3), you have public abstract Long getID();
Comment: ‘Long’ should indeed be ‘long’ with a lowercase ‘l’.

This problem also occurs on the following lines (page 3)
public abstract void setID(Long id);
public Object ejbCreate(Longong id, int count)
public void ejbPostCreate(Longong, int count)

Example 1-2. Local Interface (page 5)
Point 2:
Book: Top of Page (page 5), you have public abstract Longong getId();
Comment: ‘Long’ should indeed be ‘long’ with a lowercase ‘l’.

This problem also occurs on the following line on (page 5)
public abstract void setID(Longong);

Example 1-3. LocalHome interface (page 5)
Book: 4th last line (page 5), and 3rd last line.
Comment: Same comment as Points 1 & 2. ‘Long’ should indeed be ‘long’ with a lowercase ‘l’.

This problem also occurs on the following line on (page 5)
public abstract void setID(Longong);

Example 1-4. Transparent counter (pages 13-14)

Point 3:
Book: On the second last line of page 13, you have private string name;
Comment: ‘string’ should indeed be ‘String’ with an uppercase ‘S’.

Point 4:
Book: On the first line of page 14, you have public void setName(long newName) {
Comment: ‘long’ should indeed be ‘String’

Point 5:
Book: On the fourth line of Page 14, you have public string getName() {
Comment: As per point 3. ‘string’ should indeed be ‘String’ with an uppercase ‘S’.

Figure 2-1 (pages 18-19)

Point 6:
Book: You list 7 points that correspond to the numbers in Figure 2-1. Point 7. Easier to Maintain
Comment: You have no point 7 in your figure.

Unreferenced Code (page 25)

Point 7:
Book: Second line of code in section. String prefix “This code is “;
Comment: You are missing a necessary assignment character = (equals) between prefix and “The…”

Unreferenced Code (2nd Example) (page 25)
Point 8:
Book: Lines result = result + “much, “; and result = result + “simpler, and neater.”;
Comment: While this is correct, it is indeed even more simpler if you replaced on both lines of result = result + with result += . It would not have really been work the mention except you are explicitly trying to demonstrate “simpler and neater”.

Unreference XML (page 32)

Point 9:
Book: In the middle of the code you have the line <include name=”**/*Test.class” />
Comment: While this is indeed valid, it would not work with you present example that you are indeed attempting to Automate with Ant. In your example, you define your JUnit Test Case class Name as TestAdder. This statement would not include the tests. It should indeed be **/Test*.class (with the third * ‘asterick) being a suffix to Test, not a prefix.

Unreference Code (page 45)

Point 10:
Book: Second line of coding section: Account valueObject;
Comment: You do indeed not use this variable, while not an error, it is unnecessary. Refer to next point for more information:

Point 11:
Book: Middle of code: account.setAccountNumber(…..), and the following line account.setBalance(….
Comment: You define no Account variable called ‘account’. So you do infact need a line of syntax: Account account; at some point. Even this however is invalid as you have not obtained an Account object, in order to use setter method setAccountNumber() and setBalance(), so you would need to have Account account = new Account(); This however is also invalid as your previous code (Page 43-44) which defines the Account class has no default constructor of no args. It is however I believe valid, as even though you don’t extend Object, you would get a define implicit Object no args constructor. I’d have to check that, but the point remains, it’s not clean sound code in regards to new Account().
And then to complete these 2 lines, setAccountNumber() is not a defined method of Account, indeed your explicitly don’t have it as part of your comments. (Page 43 Remebering your requirements, you want to keep the account number private, so you scope it accordingly, and omit the setter).

All that being said, you could do the following as an alternative to these lines.

Account account = new Account(result.getString(“accountNumber”), (float)results.getDouble(“balance”));
return account;

This could even be simplified further to simply

return new Account(result.getString(“accountNumber”), (float)results.getDouble(“balance”));

Nice and clean.

Point 12:
Comment, while on the subject of this coding example, given changes necessary, I’d make two other comments regarding this code fragement.
Firstly, you close your stmt variable with stmt.close() but you don’t close your ResultSet variable result. Good coding practice would close both, with appropiate error checking as shown with stmt.
Second, while this method is static, I would not choose to use a global Connection variable referenced as conn. This should be passed to this method as an argument.

Of course I’ve only looked at the first 1/4 of the book for some bedtime reading, and I haven’t actually taken this code and passed via a compiler, but I wanted to bring these comments to your attention. I am however enjoying the content so far of this book.

The Java Spring Framework

I’ve been reading Spring in Action as part of reskilling in Spring Framework and Hibernate. The rationale of this was, I wanted a better testing capacity of my web apps, and after some review of a number of options and input from other colleagues I went down the Spring path.

Now, Spring throws a lot of new terms at you,Aspect Oriented Programming (AOP) and Inversion of Control (IoC), and it takes some time to get into the application. A Hello World example is not a simple thing, with a number of moving parts. Still no pain, no gain. The obvious change in this development path was a significant increase in XML which started to concern me.

After some more reading and examples, I came increasing worried that I’d opened a can of worms. I choose Spring to ensure betting testing capability, but instead at this time, the verdict is out. The reason is XML. There is a lot of it, and now additional testing of this is necessary. It’s necessary to ensure consistency between XML and Classes, and most importantly, from my work to date, there is a certain amount of complexity in the XML/Class coupling. While it’s a loose coupling, the ability to test incorrect or invalid Spring XML, but valid XML beckons.

Spring Framework In some defense, springs provides an abundance of functionality and integrations with Other Web Frameworks such as JSF, Taspetry, WebWork and Structs, ORM Persistence Frameworks such as Hibernate, JDO, iBATIS, OJB, but when it comes down to it, I’m left thinking, it takes a lot more to get started in Spring, to get productive. It all seems anti KISS and anti YAGNI.

Putting aside my initial impression after about 4 weeks, my latest order of books from Amazon arrived, and while waiting for an X-Ray yesterday, I pulled out Beyond Java. It was interesting that some comments where:

  • The many frameworks designed to simplify the Java development experience do make more experienced Java developers more productive, but make the learning curve too steep for those new to Java.
  • Tremendous tools like Hivernate and Spring can let you build enterprise-strength applications with much less effort. But it can take a whole year to confidently learn how to wield these tools with skill.
  • AOP can also help, by letting you write plain old Java objects (POJOs) for your business rules, and isolated services in prepackaged aspects like security and transactions.These abstractions, though, make an ever rising river for a novice to navigate.

This all adds up to one thing, Complexity, when we should be working towards Simplicity. Why is it harder to write code, surely with all these advances it should be easier to write code, in fact why are we still writing code. You have to wonder when the next jump in the technology will occur.

Then again, we are still driving inefficient cars after 100 years when there are better and more efficient alternatives.

Linux Format Reader Awards 2006

The Linux Format magazine is having it’s annual reader awards in a number of categories.

These include (I’ve include my picks after each category):

You can nominate at Nomimations close Friday 10 Feb 2006.

Adding to the Library Collection

I took the chance today to order some books from Amazon today to add to the library. Of course I’m still reading 2 current books Spring in Action and the MySQL Certification Study Guide in order to site the second MySQL Professional Certification Exam.

As with most things, you start off looking or reading on the web for something and you end up completely somewhere else. In this case, it was looking at Linux Software Labs (Australia) at the price of their Linux Distribution CD’s, which lead me to the book Beyond Java listed on their site. Called my local computer book store, but not being open (Boxing Day Public Holiday), lead me to go, “well I’ve been meaning to order some books from Amazon”, what were they again. So this lead me to coming up with a whole new list, and I figured for the cost of freight to Australia, I may as well order a few. So here is what I got.

Better, Faster, Lighter Java, MySQL (3rd Edition) (Developer’s Library) , High Performance MySQL,and of course Beyond Java.

The hard part now being the waiting 6-10 days.

December Java Users Group talk on AJAX

I attended the December meeting of the Brisbane Java Users Group last night. The presenters Alex and Brad from Working Mouse a Brisbane Based J2EE Solutions Provider gave a talk on AJAX.

What is AJAX? It stands for “Asynchronous Javascript and XML”. While the name has stuck, it both does not require Asynchronous communication, nor need to use XML, at least the Javascript part stays. AJAX is also not a new language or technology, merely a collection of technologies grouped together to provide a given function, which is to provide rich feature in page functionality within a web browser. The presentation centered around DWR – Direct Web Remoting implementation. There are in fact a number around in various server languages.

Let me explain some more, providing dynamic content on a website is straight forward, when you request a page, however to provide dynamic content within a page without refreshing the page (and in turn keeping all page state) is not a feature of the HTTP protocol. The most obvious case always presented is when selecting a Country Select Box value, a Select Box of States is populated based on the selection without the user seeing both the entire page reloading and waiting for this. There are of course a number of examples of use.

AJAX isn’t new, infact the underlying requirements within AJAX, the DHTML, DOM manipulation and XMLHttpRequest were available in 1997 (as mentioned in the presentation by Brad). In fact, I implemented functionality to perform what AJAX does back in the late 90’s, probably starting 1999, using solely Javascript, and some of that is still in use today on at least one of my sites. Of course Google made this functionality popular with it’s use in Google Suggest a few years ago.

While the presentation was a good introduction to those that had not seen this in operation, the subsequent discussions over dinner prompted some strong reactions, which is good in our line of work.

This technology implementation is inherently flawed, primarily due to the reliance on a Web Browser, and being both a multitude of available browsers across platforms and more specifically a lack of standards adoption causes this technology simply not to be available for all users. Of course Microsoft Internet Explorer is a significant pain in the butt here, as it’s simply not standards compliant, and you are forced to write bad code to work in IE simply due to it’s market penetration. There are of course a lot more of concern, proxies at multiple levels of interaction can drive you mad, and the increases in bandwidth and server performance.

That aside, the issue of needing to provide this level of rich content within a browser is another very good case. This is driven by end user need, and ultimately it is rather ridiculous it’s complicated code, it’s yet another language within the application to support, and the support is difficult, it’s even more complicated to provide some type of automated testing. But I guess the strongest comments came from Max, who recognised me after 15 years. Max was a lecturer in my undergraduate studies from 87-89, a long time ago. I would place Max (not his real name by the way, it’s a long story which took some research at the time), as one of the top three lecturers in my studies that influenced my path to where I am today.

His points were totally valid, why oh why are we doing this, it’s just ridiculous this level of complexity, to do what a browser was not simply designed to do. I would tend to agree, we are forced again by the influence of Microsoft technologies on end users to provide a level of experience they have been brainwashed into. It so reminds me of the The Matrix movie, where everybody is living under the power of the machines (Microsoft), and there a small few fighting a rebel cause to show them what the picture really looks like.

XP Group in Brisbane

Brisbane has another XP Group. Just found out about it. Info can be found at I’ve been involved in some part in 2 previous groups in Brisbane.

I’m thinking about some ideas myself, I’ve got all the XP skills, however I’m now skilling up in Spring (a full-stack Java/J2EE application framework) and Hibernate (a powerful, ultra-high performance object/relational persistence and query service for Java). And I’ve got 2 other friends in similar positions.

Wouldn’t it be great if for 6 to 8 weeks, a few hours a week we could work on a project honning traditional XP as well as having some experience people in technologies helping others. Of course in comes back to some giving all to others, but I’m sure it doesn’t have to be that way.

HTML (ampersand) Character Codes

· (&middot;) Middle Dot
• (&#149;) Bullet, black small circle
« (&laquo;) Left-pointing double angle quotation mark
» (&raquo;) Right-pointing double angle quotation mark