The challenges of compiling non working Open Source (Part 2)?

Did I push to much in my last post? I don’t think so, but I guess it’s a fragile balance sometimes in Open Source between those keen end users, and the developers that do give so much towards their own creations (I understand, I’m in that category myself).

I was very proud of my work yesterday, it took a whole day of my time (I do have better things to do, like finish my own Open Source project HTMLtags, while will allow me to build my sample application, which I can then use for my MySQL Users Conference presentation). I learnt to dig around the net a lot, go on the wild goose chase several times, understand some more under the hoods of compiling, libraries and dependencies in the GTK world I would have otherwise not really cared about. But as I said, I got to a brick wall by the end, and it was dishearting.

It seemed my Bug Report #16604 on MySQL Workbench compiling listing a clear number of bugs was not well received by the development team. Maybe I should have slept on it, but about 1am in the morning I made another plea for assistance.

Well I woke this morning, and as I mentioned in my opening statement was I too passionate about this pursuit. Perhaps not. Overnight, a positive response to my #16604 provided feedback that my hard work was indeed incoporated, and the next clue of compilation (which could just be in a simple INSTALL file hopefully in the future) was presented.

So eagerly ignoring breakfast, and the increasing pile of dirty dishes, I jump in with this new intel. It ws just one line.

you need to configure gui-common with: ./configure –enable-grt –enable-canvas to enable the components needed by Workbench.


$ cd /src/mysql-workbench-1.0.0/mysql-gui-common
$ ./configure --enable-grt --enable-canvas
$ make
make[3]: Entering directory `/src/mysql-workbench-1.0.0/mysql-gui-common/library_grt/source'
if gcc -DHAVE_CONFIG_H -I. -I. -I../.. -I../../library_grt/include -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/include/libxml2 -I../../library_util/include -I../../library_util/shared_include -I../../library_grt/newt -I/opt/mysql/include -I/usr/include/pcre -I/usr/include/python2.3 -DENABLE_JAVA_MODULES -DENABLE_PYTHON_MODULES -DLUA_TEXT_DIALOGS -g -O2 -MT lua_dialogs.o -MD -MP -MF ".deps/lua_dialogs.Tpo" -c -o lua_dialogs.o lua_dialogs.c;
then mv -f ".deps/lua_dialogs.Tpo" ".deps/lua_dialogs.Po"; else rm -f ".deps/lua_dialogs.Tpo"; exit 1; fi
lua_dialogs.c:20:17: lua.h: No such file or directory
lua_dialogs.c:21:21: lauxlib.h: No such file or directory
....
make[3]: *** [lua_dialogs.o] Error 1
make[3]: Leaving directory `/src/mysql-workbench-1.0.0/mysql-gui-common/library_grt/source'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/src/mysql-workbench-1.0.0/mysql-gui-common/library_grt'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/src/mysql-workbench-1.0.0/mysql-gui-common'

More googling, hopefully http://www.lua.org is what this is???


$ cd /src
$ wget http://www.lua.org/ftp/lua-5.0.2.tar.gz
$ tar xvfz lua-5.0.2.tar.gz
$cd lua-5.0.2
$ ./configure
$ vi config
# (change Line 151: INSTALL_ROOT from /usr/local to /usr)
$ ./configure
$ make
$ make install


$ cd /src/mysql-workbench-1.0.0/mysql-gui-common
$ make
make[3]: Entering directory `/src/mysql-workbench-1.0.0/mysql-gui-common/source/linux'
if g++ -DHAVE_CONFIG_H -I. -I. -I../.. -DXTHREADS -D_REENTRANT -DXUSE_MTSAFE_API -I/usr/include/libglade-2.0 -I/usr/include/gtk-2.0 -I/usr/include/libxml2 -I/usr/lib/gtk-2.0/include -I/usr/X11R6/include -I/usr/include/atk-1.0 -I/usr/include/pango-1.0 -I/usr/include/freetype2 -I/usr/include/freetype2/config -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/include/gtkmm-2.4 -I/usr/lib/gtkmm-2.4/include -I/usr/include/glibmm-2.4 -I/usr/lib/glibmm-2.4/include -I/usr/include/gdkmm-2.4 -I/usr/lib/gdkmm-2.4/include -I/usr/include/pangomm-1.4 -I/usr/include/atkmm-1.6 -I/usr/include/sigc++-2.0 -I/usr/lib/sigc++-2.0/include -I/opt/mysql/include -I/usr/include/pcre -DENABLE_JAVA_MODULES -DENABLE_PYTHON_MODULES -I/usr/include/freetype2 -I../../library/include -I../../library_util/include -I../../library_util/shared_include -I../../library_grt/include -I../../library_grt_modules/include -I../../library_grt_workbench/include -I../../library_gc/include -I../../library_gc/ftgl/include -I.. -DDATADIRNAME=""share"" -DCOMMONDIRNAME="""" -g -O2 -MT MGRT.o -MD -MP -MF ".deps/MGRT.Tpo" -c -o MGRT.o `test -f 'MySQLGRT/MGRT.cc' || echo './'`MySQLGRT/MGRT.cc;
then mv -f ".deps/MGRT.Tpo" ".deps/MGRT.Po"; else rm -f ".deps/MGRT.Tpo"; exit 1; fi
MySQLGRT/MGRT.cc: In member function `void MGRT::init_thread(const std::string&)':
MySQLGRT/MGRT.cc:115: error: `myx_grt_shell_print_welcome' undeclared (first use this function)
MySQLGRT/MGRT.cc:115: error: (Each undeclared identifier is reported only once for each function it appears in.)
MySQLGRT/MGRT.cc:170: error: `myx_lua_init_loader' undeclared (first use this function)
MySQLGRT/MGRT.cc:194: error: `myx_grt_init_lua_shell' undeclared (first use this function)
MySQLGRT/MGRT.cc: In member function `void MGRT::perform_shell_command(const Glib::ustring&)':
MySQLGRT/MGRT.cc:275: error: `myx_grt_lua_shell_execute' undeclared (first use this function)
MySQLGRT/MGRT.cc: In member function `Glib::ustring MGRT::shell_prompt()':
MySQLGRT/MGRT.cc:283: error: `myx_grt_lua_shell_get_prompt' undeclared (first use this function)
make[3]: *** [MGRT.o] Error 1
make[3]: Leaving directory `/src/mysql-workbench-1.0.0/mysql-gui-common/source/linux'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/src/mysql-workbench-1.0.0/mysql-gui-common/source'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/src/mysql-workbench-1.0.0/mysql-gui-common'
make: *** [all] Error 2

This seems defined in library_grt/source/myx_grt_lua_shell.c which has compiled to object code. So it’s probably clearly related with the lua dependency in
some way, but I’m not a C++ developer so apart from googling and greping it’s a good 3 levels over my head.

A very strong dead end again, clearly very C++ and MySQL specific. Back to my new buddy within MySQL AB, for the next clues.

The challenges of compiling non working Open Source?

One of the great benefits of Open Source, it’s Free, and you can get great support, sometimes even from the developers directly (rather then 5 levels of paid customer support for a commercial product). One of the greatest banes of Open Source, if you have a problem, and nobody has experienced and documented in a forum etc the problem you have with the same OS, libraries etc, you could be totally up the creek without a paddle, boat and for that matter water. (luckily you still have oxgyen)

Well, I’m having this problem with MySQL Workbench. A product promising so much, but if you can’t get the binary working on Linux to even start, where do you go.

You will see via the Forums, I’m not the only person. This is the current Bugs List.

Wanting to make a difference, even just for myself, and those others that also seem lost, I set out to pursue this to the bitter end. Long story short, some 6-7 hrs later I’m so close, and the response in a related Bug at the point I’ve now finally reached is:

at this moment we do not process bugs of mysql workbench, because it’s still in the intensive development, and we provided snapshot in order to give people first implression of WB.
I am changing status to ‘Analyzing’ and return to this bugreport when WB will be issued officially.

Well, my only statement here at 11:30pm at night, is I’m annoyed and frustated. The only reason why I’m compiling source is the snapshot doesn’t work, there is no information of when it will be issued officially, infact as I’ve mentioned previously, there are 3 different versions for 3 different OS’s at present. (More Info in Forums). It’s complicated as it seems to be all one way communication, people reporting problems, but no information feedback in return. Perhaps I should have worked on my own Open Source project?

My simple question in response to this comment is. “Please provide the source that built the binary, so at least we can work with the same apples. The released source and snapshot binary are not consistent.”

My only pursuit now is to publish my findings. I’m made signficant progress that will help others, but only so far. I’ve also uncovered 3 configuration errors, and library inconsistency as per minimum requirements (which isn’t documented, it’s hit and miss, trial and error), and a fatal compilation error stopping you in your tracks. My current logged bug report of this Bug #16604

The road already covered

My environment is CentOS 4.2 (a.k.a RedHat Enterprise Linux RHEL 4.2) recompiled and free.

The first pot hole.

Downloaded Linux Binary 1.0.0-alpha from ftp://ftp.mysql.com/pub/mysql/download/mysql-workbench-1.0.0-alpha-linux-i386.tar.gz (announced on forums MySQL Workbench 1.0.0alpha for Linux (November 24, 2005)

Errors on startup.
./mysql-workbench-bin: /usr/lib/libstdc++.so.6: version `GLIBCXX_3.4.5' not found (required by ./mysql-workbench-bin)
./mysql-workbench-bin: /usr/lib/libstdc++.so.6: version `GLIBCXX_3.4.4' not found (required by ./mysql-workbench-bin)

Refer to my Initial findings published at forums (December 16, 2005) over 1 month ago with details of my environment and installed libraries.

Looking at the hole, now it’s a trench.

Diving right in.

$ su -
$ cd /src
$ wget ftp://ftp.mysql.com/pub/mysql/download/mysql-workbench-1.0.0-alpha-linux.tar.gz
$ tar xvfz mysql-workbench-1.0.0-alpha-linux.tar.gz
$ cd /src/mysql-workbench-1.0.0/mysql-gui-common
$ ./configure

Errors
No package ‘gtkmm-2.4′ found
configure: error: Library requirements (libglade-2.0 gtkmm-2.4) not met; consider adjusting the PKG_CONFIG_PATH environment variable if your libraries are in a nonstandard prefix so pkg-config can find them.
Found http://www.gtkmm.org – gtkmm is the official C++ interface for the popular GUI library GTK+. First problem, current version is 2.8, previous version (as on home page links is 2.2).

More digging, clicking the home page documentation link, and some docs recommend binary installs (why am I not surprised). gtkmm 2.4 documentation. What the! The home page has 2.8 and 2.2, and the docs are 2.4. I’m a little confused, but I trundle on.

Found http://www.gtkmm.orgNo RHEL binaries, tried for Fedora Core 4. Attempt to Add FC4 extras to yum settings and do yum install gtkmm24-docs failed. (this was the docs recommendation)

So track it down on a mirror. Now that is now gtkmm24 Version 2.6???? I’m sure there’s a reason but I’m a GTK lay person and it’s confusing me.


$ wget http://public.planetmirror.com/pub/fedora/linux/extras/4/i386/gtkmm24-2.6.2-2.i386.rpm
$ wget http://public.planetmirror.com/pub/fedora/linux/extras/4/i386/gtkmm24-devel-2.6.2-2.i386.rpm
$ wget http://public.planetmirror.com/pub/fedora/linux/extras/4/i386/gtkmm24-docs-2.6.2-2.i386.rpm

$ rpm -ivh gtkmm24-2.6.2-2.i386.rpm
warning: gtkmm24-2.6.2-2.i386.rpm: V3 DSA signature: NOKEY, key ID 1ac70ce6
error: Failed dependencies:
libglibmm-2.4.so.1 is needed by gtkmm24-2.6.2-2.i386
libsigc-2.0.so.0 is needed by gtkmm24-2.6.2-2.i386
libstdc++.so.6(GLIBCXX_3.4.4) is needed by gtkmm24-2.6.2-2.i386

I figured it wasn’t going to be that easy. However, found an interesting note, (GLIBCXX_3.4.4), hmmm, see that error before. Is that sunlight I see looking up out of the trench.

It doesn’t ran it pours.

Won’t focus here.

libsigc Not found in CentOS RPMS Found at http://libsigc.sourceforge.net

$ cd /src
$ wget http://ftp.gnome.org/pub/GNOME/sources/libsigc++/2.0/libsigc++-2.0.17.tar.gz
$ tar xvfz libsigc++-2.0.17.tar.gz
$ cd /src/libsigc++-2.0.17
$ ./configure
$ make
$ make install
$ rpm -ivh gtkmm24-2.6.2-2.i386.rpm

Still fails with same dependancies including libsigc-2.0.so.0 which is what I just compiled.

Using a trench digger now, shovelling was too much work.

I won’t bore you with the iterative details of my approach, needless to say, basically I had to work backwards from this resultant coding, a number of times. But this worked. (Compiling the dependancies that is)


$ cd /src
$ wget http://ftp.gnome.org/pub/GNOME/sources/libsigc++/2.0/libsigc++-2.0.17.tar.gz
$ tar xvfz libsigc++-2.0.17.tar.gz
$ cd /src/libsigc++-2.0.17
$ ./configure --prefix=/usr
$ make
$ make install

One Down, note the –prefix=/usr is significant.. There must be some funky way to use LD_LIBRARY_PATH, LIBDIR, just need somebody to explain why simple evironment variable doesn’t work downstream.


# configure: error: Package requirements (sigc++-2.0 >= 2.0.0 glib-2.0 >= 2.4.0 gobject-2.0 >= 2.4.0 gmodule-2.0 >= 2.4.0) were not met.
$ cd /src
$ wget http://ftp.gnome.org/pub/GNOME/sources/glibmm/2.4/glibmm-2.4.8.tar.gz
$ tar xvfz glibmm-2.4.8.tar.gz
$ cd /src/glibmm-2.4.8
$ ./configure --prefix=/usr
$ make
$ make install

Two down.


# Requires checking for glibmm-2.4 >= 2.4.0 atk >= 1.6.0... Package glibmm-2.4 was not found in the pkg-config search path.
$ cd /src
$ wget http://ftp.gnome.org/pub/GNOME/sources/gtkmm/2.4/gtkmm-2.4.11.tar.gz
$ tar xvfz gtkmm-2.4.11.tar.gz
$ cd /src/gtkmm-2.4.11
$ ./configure --prefix=/usr
$ make
$ make install

Three Down.


$ cd /src/mysql-workbench-1.0.0/mysql-gui-common
$ ./configure
checking for pcre-config... no
configure: error: Could not find pcre-config script. Make sure the pcre libraries are installed

What is pcre-config? Not much luck finding that, try pcre, come across http://www.pcre.org/ – Perl Compatible Regular Expressions. Well could be, no info about pcre-config, nothing on the Perl regular expressions man page. Well, no pain no gain, it’s in a yum list so.


$ yum install pcre

Already installed, well that doesn’t help. Some more digging around, more time digging a bigger hole, there’s a pcre-devel, let’s try that.


$ yum install pcre-devel

Voila!, I have a pcre-config script


$ cd /src/mysql-workbench-1.0.0/mysql-gui-common
$ ./configure

./configure: line 3488: -f: command not found

Are a problem, lucky already documented in Forums (at least somebody has been this far).

Replace Line 3488
if ! -f po ; then
with
if test ! -f po ; then

Try Again.


$ cd /src/mysql-workbench-1.0.0/mysql-gui-common
$ ./configure

./configure: line 7368: syntax error near unexpected token `else'

Haven’t see this documented.

Replace Line 7466 (note 2 lines earlier)
if test "${ac_cv_prog_PCRE_LIBS+set}" = set; theN
with
if test "${ac_cv_prog_PCRE_LIBS+set}" = set; then

Try Again.


$ cd /src/mysql-workbench-1.0.0/mysql-gui-common
$ ./configure
$ make
../../library_util/include/myx_util_functions.h:30:18: pcre.h: No such file or directory

Patience is running short, a find locates file in /usr/include/prce directory.


# I'm now no longer amused with this, I'm not a C developer, but basic code should compile.
$ cp cd /usr/include /pcre/* /usr/include
$ ./configure
$ make
MGTableEditor.cc: In constructor `MGTableEditor::MGTableEditor(bool)':
MGTableEditor.cc:229: error: 'class Gtk::ComboBoxEntry' has no member named 'get_entry'
MGTableEditor.cc:265: error: 'class Gtk::ComboBoxEntry' has no member named 'get_entry'

Well, why am I not surprised. More reading, now into the bowes of GTK. API Docs at http://www.gtkmm.org/docs/gtkmm-2.4/docs/reference/html/classGtk_1_1ComboBoxEntry.html
shows that get_entry is valid for 2.4.

Hmmm, more though, now is that 2.4, or is that 2.6. Well this trench digger just fell into the hole is was digging.

A bigger hammer, now the full blown Hydraulic Excavator

I won’t linger here, been down this path before, however more plumbing was required so it was an iterative process again.

cd /src
wget ftp://ftp.gtk.org/pub/gtk/v2.6/glib-2.6.6.tar.gz
ftp://ftp.gtk.org/pub/gtk/v2.6/glib-2.6.6.tar.gz
cd /src/glib-2.6.6
./configure --prefix=/usr


# checking for sigc++-2.0 >= 2.0.0 glib-2.0 >= 2.6.0 gobject-2.0 >= 2.6.0 gmodule-2.0 >= 2.6.0... Requested 'glib-2.0 >= 2.6.0' but version of GLib is 2.4.7
cd /src
wget http://ftp.acc.umu.se/pub/GNOME/sources/glibmm/2.6/glibmm-2.6.1.tar.gz
tar xvfz glibmm-2.6.1.tar.gz
cd /src/glibmm-2.6.1
./configure --prefix=/usr
make
make install


cd /src
wget ftp://ftp.gtk.org/pub/gtk/v2.6/atk-1.9.0.tar.bz2
bunzip2 atk-1.9.0.tar.bz2
tar xvf atk-1.9.0.tar
cd /src/atk-1.9.0
./configure --prefix=/usr
make
make install


cd /src
wget ftp://ftp.gtk.org/pub/gtk/v2.6/pango-1.8.2.tar.gz
tar xvfz pango-1.8.2.tar.gz
cd /src/pango-1.8.2
./configure --prefix=/usr
make
make install


# Requested 'pango >= 1.8.0' but version of Pango is 1.6.0
cd /src
wget ftp://ftp.gtk.org/pub/gtk/v2.6/gtk+-2.6.9.tar.gz
tar xvfz gtk+-2.6.9.tar.gz
cd /src/gtk+-2.6.9
./configure --prefix=/usr
make
make install


#pango-1.8.2.tar.gz
# checking for ATKMM... Requested 'atk >= 1.9.0' but version of Atk is 1.8.0
# checking for GDKMM... Requested 'gtk+-2.0 >= 2.6.0' but version of GTK+ is 2.4.13
cd /src
wget http://ftp.acc.umu.se/pub/GNOME/sources/gtkmm/2.6/gtkmm-2.6.5.tar.gz
tar xvfz gtkmm-2.6.5.tar.gz
cd /src/gtkmm-2.6.5
./configure --prefix=/usr
make
make install

Now the test.


$ cd /src/mysql-workbench-1.0.0/mysql-gui-common
$ make clean
$ ./configure
$ make
$ make install

WOOOOHOOO! no errors here!

One down, one to go.


$ cd /src/mysql-workbench-1.0.0/mysql-workbench-1.0.0
$ make clean
$ ./configure
$ make

make[3]: Entering directory
`/src/mysql-workbench-1.0.0/mysql-workbench/source/linux'
make[3]: *** No rule to make target
`../../../mysql-gui-common/source/linux/libwbcommongui.a', needed by `mysql-workbench-bin'. Stop.

Investigation

$ cd /src/mysql-workbench-1.0.0/mysql-workbench/source/linux # (current make directory)
$ ls -l ../../../mysql-gui-common/source/linux/lib*
-rw-r--r-- 1 root root 7410114 Jan 18 22:43 ../../../mysql-gui-common/source/linux/libmacommongui.a
-rw-r--r-- 1 root root 8006062 Jan 18 22:44 ../../../mysql-gui-common/source/linux/libqbcommongui.a

Similar library name exists, but not libwbcommongui.a Well that’s it, it’s finally now looks not a environement problem, more a software development problem. I’ve logged my findings at Bug #16604. I guess we will wait for a good response.

Was it all worth it! Well 90 mins of documenting, the longest blog I’ll ever write. If the next runner can move this forward to a getting a binary from compilation, that then starts, then it was worth it. But only in a reasonable time.

Database Modelling Software for MySQL

I’m stuck between a rock and a hard place. I’ve been using DBDesigner 4 from FabForce, an open source visual design tool, and apart from working around a number of bugs, I’ve found it practical to design from scratch. The big plus, it works under Linux.

With the announcement that this was being incorporated into MySQL, called MySQL Workbench, I was looking forward to getting my hands on it. I guess that was about 8 months ago. Finally about 6 weeks ago, Version 1.0.0-alpha was released for Linux. Unfortunately it didn’t work, would not even start for me. Logged as Bug #15421, which got marked as a duplicate of Bug #15218 (I could have sworn I did a search first). Anyway, this got promptly closed as Unable to reproduce, but I see it’s finally been reopened again.

Windows is at 1.0.2, Mac OS/X is at 1.0.3-alpha, and yet linux is still at 1.0.0-alpha. What gives? There seems to be a clear lack of communication as to what’s going on.

The problem is, I’m preparing presentatations for the Brisbane MySQL Users Group, and the MySQL User Conference, and I want to import an existing MySQL database into a modeling tool to generate diagrams. It seems that DBDesigner4 is not MySQL 5 friendly, and given it’s not supported anymore where do you turn.

In DBDesigner, I get the error. “dbExpress error: Invalid Username/password”. Upteen double checks and then searching on the web finally leads to
http://forums.mysql.com/read.php?113,32121,57081#msg-57081, and the trick SET PASSWORD FOR ‘some_user’@’some_host’ = OLD_PASSWORD(‘newpwd’);

So now I can get a connection to the database, there ain’t no model to import. It seems I’m now forced to go look elsewhere for a open-source and free modelling tool that runs under Linux.

Ultimately I can’t really complain, this is the primary problem with Open Source, on one side, it’s Free, the other side support, documentation and help can be a hit and miss affair, and there are no commitments to a release schedule or feedback as there is no money involved.

Still if after all this time, I’m forced to go out and buy a product, it will be a great shame.

How many installations, and just what are they doing?

Would it not be great if on the MySQL website there was a page of stats (updated daily) that provided statistics like number of installations, a breakdown of versions registered (not certain I like that exact word) , OS’s, countries etc. More specifically, some useful stats on the engine types in practical use, avg number of tables per database etc. Of course the types of stats could be limitless, but with the success of MySQL as well as other open source projects, more imperial figures on installations other then just downloads I think would definitely benefit given the current momentum. (Availability of information to competitors could be both a good and bad thing.) Perhaps figures can be shown in percentages, not actual numbers.

Anyway, nice idea you say, we can all come up with ideas, but how could you implement something like this.

I made a post yesterday and mentioned the thought of an XML storage engine. It seemed to stir some feedback. Those notes were actually taken from more detailed notes that I’ve never published so I thought it would be good to explain some of the background.

Many months ago at the Brisbane MySQL Users Group our informal discussions following the evening presentation turned to a more detailed analysis of the different engine types available within MySQL. A throw away comment by somebody like ‘We have no idea if BDB is really used?’ prompted me to consider the possibility of why can’t this question be answered in the future.

How would it work? Well for each installation of MySQL on a server (given you can have multiple installations of varying versions per machine) the option to provide a feedback loop is made available post installation, it could even be part of the install processs (probably not in .rpm). The feedback loop configures the following basic settings.

  • Frequency (Monthly, Weekly, Daily, Once Off)
  • Granularity (Summary, Verbose)

What would I have this feedback loop submit:

  • MySQL Version
  • OS Version
  • Country/Locale
  • Number of User Databases (excluding mysql & test)
  • Number of tables per storage Engine per database

More detailed information, not necessary for stats, but perhaps useful for MySQL internals would be output from the SHOW set of commands. (variables, status, table status,engines, innodb status). This alone is another topic, allowing collection of information that can be used to evaluate usage, possible tunings etc). Additional information such as number of triggers, stored procedures etc, would identify if this level of functionality is actually really being used out there.

By registering with MySQL to provide a feedback loop from your server installation to MySQL, you are provided with a unique Id, which enables clear tracking for MySQL, but provides clear anonymity for the customer to ensure we are not taking any propriety information from them.

We never want to hide the data of an installation that is provided to MySQL, it needs to be in the open. Why not make it an open standard of data exchange (using XML), and why not then enable the installation itself to use these statistics internally for some level of reporting.

XML allows for the files to be reused outside of MySQL directly. Using XSLT you could take these XML files for better presentation of statistical information. This would probably go against the ethos of MySQL with now information stored in XML instead of the database. So why not create an XML Storage Engine.

It would for lack of a better word be grossly inefficient, however it was more a Read/Only type structure, more you just write out a chunk of data in one go, and you always read the entire file. You don’t do partial updates (you could, but you rewrite the entire file), it leads itself to statistically information, written once only. Of course you would have to satisfy the search

What other possible extensions for an XML storage engine, you could store and read RSS feeds, even OpenDoc standards as implemented by Open Office for Documents.

Now, regardless of all the associated complexities like, how do you post add this functionality to older versions, how do you automate the scheduling of the feedback look across different OS’s, blah, blah, blah, the purpose is just to throw the idea around for a minute and see what falls out.

Additional statistics more useful for internal MySQL usage could pinpoint numbers of upgrades, from which version to which versions (more specifically if not current GA version). The biggest downside is as part of Open Source you can’t enforce gathering any of this information, so when the website says 1 million installations (it’s 1 million of people that have taken the time to notify MySQL, and that could be 10% or 50%, you just don’t know).

Could there perhaps be space here for a commercial product? I doubt it as this is very tightly coupled with MySQL, and unless there was seamless integration I doubt very few people would go to any trouble after installing MySQL.

It’s just an idea, but it’s nice to have ideas at least some of the time.

MySQL 5.1 is gaining some momentum

It wasn’t that long ago that MySQL released the GA Release of Version 5.0 with major new features (Oct 24 2005). It still took 5.0 about a year to go from alpha to GA, however I’d suspect a much shorter turnaround this time.

Version 5.1 is already at alpha, and the largest public functionality mentioned has been Partitioning. It is also anticipated that Storage Engines (a very handy MySQL feature in comparison to other RDBS products), will be a hot-pluggable API instead of a source re-compilation. Now I’ve never even looked at the Storage Engine code, but it’s been talked about a few times, particularly the CSV Storage Engine in general discussion.

There are some new features being documented now, but not generally available in an alpha build including Events Management (5.1.6) and XPath support within SQL (5.1.5).

Well I could use the events management immediately with some refactoring of code into a stored procedure. My current CRM allows for a system wide (ie all users) Phone Batch to be created for phone donations. Batches are locked into daily, so a new batch must be manually created each day. The ability to both automatically create a daily open Phone Batch, as well as automatically close the previous day’s while only a small task, removes a currently repetitive task from somebody’s day. Of course at present, this is planned, but via a cron job and wrapper script, now I could do it all in the Database.

XPath expressions, interesting, I use XPath Explorer a great standalone Java app, and Eclipse Plugin, but I guess now I can just do it at a SQL Prompt. What I’d really love to see is an XML Storage Engine.

XML Storage Engine

What about an XML Storage Engine? It would for lack of a better word be grossly inefficient, however it was more a Read/Only type structure, more you just write out a chunk of data in one go, and you always read the entire file. You don’t do partial updates (you could, but you rewrite the entire file), it leads itself to statistically information, written once only. So would it provide any benefit. Really only you have any sources of XML data and you just want to keep this information in an XML format, effectively the MySQL data file is an actual XML file that can be easily copied.

XML would allow for the files to be reused outside of MySQL directly. Using XSLT you could take these XML files for better presentation of statistical information. What other possible extensions for an XML storage engine could be possible. You could store and read RSS feeds, even OpenDoc standards as implemented by Open Office for Documents.

Would it serve any real purpose? Probably not. But with MySQL Storage Engines at least it’s easily possible.

Unit Testing A Database

In a recent job interview I was asked the question regarding Unit Testing/Automated Testing of a Database? An interesting question and indeed an interesting problem. I thought it was a good topic to describe what I’ve done in the past, and where I would go for a more complete testing environment given the opportunity of a entire XP project.

This is the approach I have implemented successfully in the past. It’s not a complete solution, however at the time with the client it provided appropriate coverage.

I don’t use a framework such as dbUnit to load data via XML, or specifically test data. XML is ugly to store data, and also with maintenance and comparison. I start with a pre-configured database of representative sample data, refer to my notes later on this, and then I use the tests of the application to perform the necessary data manipulation. This ensures that you are as close as possible to testing actual situations, and ensuring that any issues the application does (such as enter bad data, or RI failures) are caught appropriately.

Within this process an automated build test would first reset the database to a known set of data. I’ve also found that this helps as you can also recreate the schema if necessary. As part of Schema Design in an XP Development, I have two ways to create the database schema. You can create it from scratch (so there is always appropriate SQL to create the current version of the Schema, lets say BUILD_102. Alternatively, you can always upgrade between releases, for example between BUILD_101 and BUILD_102 with the appropriate upgrade scripts. Upgrade or patch scripts only move one version to the next version. It’s not possible in a production environment to simply recreate your schema for each release, however for testing and training you can. It’s also pointless after 50 releases to have to perform 50 patch releases from the original source schema for every automated build test.

This does lead to two paths necessary for creating a schema, but this can also be tested adequately in an automated way.

I also split my application tests suites that use the sample data into two buckets. Destructive and Non-Destructive. The reason is the Non-Destructive tests (i.e., non DML statements) can be re-run as many times as necessary. The Destructive tests (i.e., DML statements) can only be run once, before the database must be restored. Of course you can have the approach of setUp() and tearDown() within JUnit however it’s cleaner if you can extract this somewhat to a higher level, making the Unit Tests easier. By also running tests that don’t continually use the some data, or builds the data though test execution, you get a better coverage of different data sets. To give a few examples, You could create a Test that created a row of data, then edited the same row, then deleted the row. These are indeed valid, but if the first test fails, how do you know if the update and delete tests are also broken, they are by dependant by default and will fail, but did they really. If with your sample data, you created a new row, edited a different pre-configured row, and also deleted a different pre-configured row, you could eliminate the need to dependencies.

Now of course, there are situations where data must be specifically checked at the database level, for example it may never be displayed in your application, it may be intermediate information that is then summarised for display, or internal audit information held against data (for example Create User, Last Updated User), or data created by procedures or triggers. There are also situations when even within an application testing that can verify the data in a User Interface, you want to verify this at the source.

To this end I have a custom written JUnit extension that can perform specific SQL statements and comparison. I’ll need to write about this and provide this at a later time. (when I can dig it up)

Sample Data

Sample Data in the database is pre-configured, not in XML files, but so it can be managed by more primitive means, either by a database GUI interface or via SQL flat files or even text files. Why have pre-configured data this way? A few reasons.

  • It’s not coupled to your tests in any way, so it can be reused, for example as Training Data.
  • You can use database specific tools more easily say in loading the data in a relational way.
  • You can use the same database specific tools to export the data easily, if say you use an application to modify certain information.
  • You could more easily incorporate legacy data that is also being migrated if you use the same database specific tools.

Granted XML is universal in it’s data representation, it’s more self descriptive, but it’s a really pain to edit manually, and it’s very verbose when there are simply more primitive methods of this type of data management.

So we are creating a pre-configured data set, and an extensive one when possible. As I mentioned, the re-use capabilities for training or demonstrations really works.

Training Data

I have successfully with a number of systems, specifically CRM implementations used a Cartoon Environment for the sample data. There are a few reasons for this. First, most people I’ve ever met can related in some way to some set of the data. If they can’t, then there can read info online, or watch a movie etc, and get an appreciation from the representative data set, effectively I’m leveraging of the time and effort of others here, much better then a non-descript set of data.

You have the cartoon characters (e.g. Mickey Mouse, Donald Duck, Daffy Duck, Marvin the Martian, The Simpsons, The Flintstones, The Jetsons), use all the streets and rides as Disneyland as addresses, the animators as the users (e.g. Walt Disney, Chuck Jones, Stan Lee), you can use the different studios (e.g. Warner Bros, Disney, Pixar) for different states or countries, you can use shows or movies (e.g. Toy Story, Shrek) to group characters in other ways.

With this type of data, common attributes such as birth date, family units, nick names, people deceased etc, are all part of the available data. It’s surprising how much information you can find when using The Simpsons for example, of full names, addresses, interests etc.

It’s impressive when the CEO of a company is showing the application to overseas business partners, when his knowledge of the application (from his management perspective) is sufficient, there is no knowledge of the data really necessary to use or explain as it’s commonly used and generally understood.

At this point I would like to ensure that I correctly acknowledge the registered trademarks of Disney, Warner Bros, Hanna Barbera, Pixax, Dreamworks and that I am not using the names for any profit.

Summary

So this is what I’ve used in the past. What would I do in the future if I was charged with bullet proof testing of a database, even independent of an application, effectively 100% test coverage of the data. Well, this is an unproven approach, but I’d relish the opportunity to give it a full blooded test one day.

How to test the database with an automated test approach.

I’d consider the breaking down of testing into 3 areas. These being:

  1. Schema
  2. Data
  3. Business Logic/Referential Integrity

Each of this is effectively built on the the preceding points.

Schema

This would be quite straight forward it’s a flat comparison between schema’s, which could be managed via the appropriate products data dictionary tables using SQL. You could even simply compare 2 schema’s in a few simple SQL statements. You could also use the approach of export the schema definition, and then compare flat files. You will find some downsides to these approaches, ordering is a big thing, columns within a database table, or the order of the tables that are exported may not be guaranteed. However given appropriate standards are defined used of tools this comparison could easily occur.

Being able to verify patches between releases, and full installed schema’s are also possible. The schema is the easy part.

Data

Data could be tested in varying means. Counts, sums and sample comparisons, but it’s also just data, why not md5sum the entire data. Why not even dump the data to flat files, and use basic difference tools for comparison. One simple approach. Especially if you are loading data, using or manipulating it, then you can export and compare at a file level. This would work very well for data considered Read Only for the life of testing.
This format may allow you to compare data between two different database products, e.g. Oracle use for your transactional online processing, and MySQL use for Web Data or Management Summary Reporting application.

In order to test the data you need the schema, but how can you test the data without the business logic and Referential Integrity. Within MySQL you can easily disable foreign key constraints, or easily adjust the table type to a structure to ignore this syntax. This could allow you to run tests with and without Referential Integrity to determine the strength of your application.

Of course this is a static version of the data. Performing separate testing of DML statements directly against the database could prove a waste of time, unless your application was written in such a way, that your application database layer was a complete API. Still you would be simulating what your application is ultimately doing, so it could be overkill. You could apply the techniques of comparison with know results after a successful running of automated build testing.

Business Logic/Referential Integrity

The hard stuff. Well you could be half way there with adequate schema and data testing. At least you are then confident that core integrity exists.

The problem is also to do with application integrity verses database integrity.

Let’s take a percentage, it goes from 0 to 100. Now using MySQL for example, you would define this as TINYINT(3) UNSIGNED, giving you a valid range of 0-255, and by default a display characteristic of 3 characters (the (3) in this example is just beautification).

You application logic restricts the value into this column to 0 to 100. But do you enforce this at the database? Depends on your needs. If the application is the only way to insert and maintain data, then you could get away with it, if data can be managed from other external systems, you may have APIs that also need to manage it. What if you grant SQL access to DBA’s, could they accidentially mess it up. That song “It’s a fine line between pleasure and pain” comes to mind at this moment. I guess what I’m getting at here, in solely database testing you could easily insert 255 into a percent column and pass a number of data specific tests. I’d assume it would fail some as well otherwise your tests aren’t complete, but when using the application you could never test 255, as the client would never allow it.

There are a lot more issues in RI testing, Cascading UPDATE, DELETE rules for etc. And then when you work all that out, you have to start with Triggers and Stored procedures. I’m not going to spend any time here at this time.

Unforeseen Side Effects

Data is a strange beast. It’s the source of information, so I always like to go back to the data for comparison, however the lack of good data (most notably Legacy Systems Migration) can drive you mad. What good is it to have a new system but not be able to enforce an adequate level of Referential Integrity or Business logic due to incomplete historical data. In essence this has proven in the time I’ve also supported large systems, a good portion of the development cost in support, it’s bad data and/or the need for a simple application to have more complex rules to cater for so called incomplete data.

I’ll give you a trivial example. Gender. In the new system an organisation will always ensure they get the gender of a customer (let’s not wonder how they do it, it’s just an example). So the application is designed to support Male/Female, Reference Data may exist to translate M & F to Male and Female respectively for data storage efficiency. Check constraints, enumeration data types (which I don’t like) may exist. Reports may do side by side comparisons.

Now the company buys a competitor, and then gets their database of 500,000 customers, but they don’t record the Gender. Do you then relax all your great integrity? Do you introduce a gender of Unknown? But that’s only for display, the maintenance screens can’t allow you to select it, so you then need a different level of Reference Data management. Do you make an educated guess and correct the data? Does the customer do an expensive mailout campaign and data collection process to correct this information? So what’s the big deal anyway the customer asks? Well if it’s an organisation that sells hygiene products, you don’t want to sending out material on Shaving Cream to Women on your list and Body Wax to Men. However you can’t use that explanation to describe solely a database driven reason to the customer for the cost of introducing this data. How do you show business value to the customer, when they simply what the data available?

I know this is a trivial example, but if I had a dollar for every trivial problem that customers spent months on,verses the really hard problems, I be writing this from a much more comfortable and relaxing resort haven. (A ski resort rather then a beach resort)

Conclusion

So can you Unit Test a Database solely without an application? Yes. Would you want to? Maybe, to a certain degree. Depending on your type of data. If all your information is highly visible in data entry and data retrieval, you should couple testing more closely with your application. If your data is very generated and collated, rarely user entered but bulk loaded, such as Sensis information or GIS information, then dedicated testing all aspects of the database decoupled from the application could indeed make your application test easier, because it’s easier to identify bad data that the application creates.

Some light hearted Google Fun


Sometimes I wonder how I get to trivial information. Well in the Wikipedia article Google Hoaxes you will find that the Google search engine is valid in a number of unusual languages including Klingon. I’m a Star Trek fan, but that’s going a little to far, however I guess if they create Elvish Language the LOTR people would love it. My favourites are Elmer Fudd and Bork Bork Bork. For those that don’t know this line, it’s famous from the Chef in the Muppets.

It took a while to find it, but you can listen to the famous Chef Song Here.

Database Modelling within an XP Methodology

In an eXtreme Programming (XP) Agile Methodology approach towards software development the absence of adequate database design, or the scant regard of it, with the assumption that a framework and persistence infrastructure will take care of that can be a disaster in a larger enterprise solution. In essence it’s a scaling effect. The smaller the system, normally the smaller the number of users, amount of functionality and volume of data does not show the inefficiencies in database design as they can be masked by acceptable performance. But scaling up the system, or designing a large enterprise system the effect will become multiplied quickly. Of course using solid XP practices, the ability to make changes and integrate will be easier, but the amount and complexity of changes may be significant.

A more pragmatic approach is necessary in Database Modelling and Design, especially in a larger enterprise solution when using XP or an Agile approach. Assuming that the choice of Relational Database/s has been chosen, greater care is necessary and advanced preparation and planning required. Purists could argue YAGNI, but ultimately the customer will be distraught if the system is perfect in functionality and user interface but can’t handle the performance a production load of users or gradual growth of users when all is well in testing, demonstrations and customer training. The other catch is the need for additional disk space added monthly due to unknown requirements.

Additional considerations such as legacy systems data migration, database sizing, database growth, performance requirements, number of users etc can’t conform to a traditional XP approach. These tasks require varying lead times, for example the purchase and configuration of hardware and software need to be augmented with the XP approach.

XP is not for every project, and in the number of instances I’ve been involved with, certain considerations due to the environment, customer and usually management are necessary to be adjusted or tailored for the specific project. I’m not stating that an XP approach can’t apply to large scale enterprise Database Modelling, more that some adjustments particularly within the Planning Process are needed to integrate a more balanced solution.

The database is the foundation, I draw an analogy when discussing with friends to building a house. If you don’t have the core foundation correct, the slab, the essential primary fittings of plumbing, power etc, and a floor plan of key, important and known things, you will forever when building your house be spending additional time and resources to prop this up, taking away time, money and energy from the significant part of building the house so it can be ultimately used. (Thought: I wonder how you would build a house XP style. What’s the most important part for the customer. Something to ponder one day).

Are foundations perfect? No. Do they change and adapt? Yes, They do. However the cost is significantly higher, and the investment in getting it right the first time is invaluable.

I can’t count the number of times I’ve come onto an existing project, and the Database Designer for lack of a better word is a novice. It makes me shutter. It’s an expertise I’ve specialised in, however I’ve had to broaden my skill set, as this task is not used throughout a traditional SDLC project of 12-24 months, and it really saddens me when simple 101 mistakes are made and the downstream impact is significant, and management don’t know or realise the impact. Even worse is when I tell them what’s needed for a correction, and they look at the impact, and the decision is that this fixed known cost to correct is worse then any projected unknown down stream costs of maintenance and future development.

Ultimately I’ve got my way on projects more easily when I’ve also focused on Application Performance Tuning in projects, where I look solely at the application needs, not just the hard core DBA or System Administration tuning. A DBA doesn’t really care about the application and end user impact, they care about the figures in the database. In this situation, low level structural changes and associated costs are weighted against application performance ultimately necessary for the end user. I remember one project that took the development/user/testing/release team 3 months of work (probably ~ 200 man weeks of work) to implement and deploy a key structural change that I had identified and proposed as part of longer performance analysis period of this system. Of course when you had 6 full-time DBA’s alone, 33 remote distributed systems and 1000’s of users, the resultant impact and the future improvements possible were worth the investment. The need not to upgrade the hardware alone was worth it. This project was also over 10 years ago and a lot of techniques have changed since then. Anyway, back to the point of the discussion.

Let me give you an example situation where you use a traditional XP approach to software development, and how to weave in a more structured database design and modelling approach.

User Stories

As part of the gathering of initial User Stories from the customer, the Database Designer should be reviewing these and beginning to build a high level Logical Data Model. The Database Designer is not needed in any initial interaction with the Customer, however early reviews may clearly indicate gaps in stories due to historical experience that can then be feed back to the customer for considerations. Initially it should just be on paper, or a whiteboard, as initially this should correspond to the high level comparison with the User Stories and also the fluid nature of changing user stories. However it should highlight immediately known key entities, key relationships between entities, key integrations to external systems and areas that involve early input of volume estimates of data, perhaps with comparison to existing legacy systems.

For example, a rates billing system I was involved in had ~400,000 clients billed quarterly (4 times a year), and each bill had on average 10 line items (just looking a small sample, and getting a figure from the legacy system). Now it doesn’t take much to consider the size of tables which would record billing history (400,000x4x10) for each cycle, or GL reconcilation over 5 years (400,000x4x10x2x5 that’s 160 Million). Disk storage alone without any hint of the average row length, or indexes can be guestimated when a number of key entities are identified. Of course it could be out by a factor of 2, 5, or 10 times at this early stage, but not a factor of 100 times.

In a situation like this with any entity, even at the first mud map, I would be recording a number of indicators when possible. These would be:

  • Initial number of rows
  • Annual growth in rows
  • Projected rows after ‘x’ years

In addition I’d also flag each entity roughly for performance considerations by access, and also volume of transactions. This helps in identification of key performance considerations which may impact the Database Design in areas such as indexes (i.e. Disk Space) and schema optimisations after normalisation.

  • OLTP
  • Batch

For example, the billing process that creates quarterly bills and inserts millions of rows is both a Batch Process and a frequency of once every three months, as apposed to new accounts which is OLTP, happens at a regular quantity per day every day, and during the day.

Now, in the mud map these figures don’t have to be accurate, in fact in the billing system example, if the table didn’t have 10,000 rows it didn’t rate a mention, and 10,000 to < 400,000 was considered small. The limit here being the volume of the most key entity. Simply because a number of tables will be a factor larger based on this key figure, and this will swamp any insignificant tables. Again, this is week 1 of a potential 52 week project, it can afford to be vague in areas.

Spike

How do you term this in a XP environment? Call it a Spike. The purpose of a Spike is to explore options and evaluate risk. While in general the practice would be to throw away the results, in this case these Spikes should be kept and built on progressively.

Release Planning

As part of the Customer’s User Stories being estimated and prioritised within Release Planning, more refined Logical Modelling and even Physical Modelling is necessary. Of course until the customer refines priorities based on estimates, the actual order of implementation can’t be confirmed.

It is possible however a number of stories within an iteration may not relate together within the Database Model. How do you address this? Regardless of the actual attributes of these logical entities that are not available, a physical model representation can be built on necessary tables and relationships to ensure practical use of functionality.

Should the need for missing relationships within a Database Model impact the priorities of User Stories? No. Estimates should accurately reflect if additional work is needed for any given User Story, hence the Database Designer is necessary in the estimation process.

Iteration Planning

As part of the Iteration Planning it is key that the physical Database Requirements are in place to enable Writing Unit Tests and Writing Code for each Development Story for the Iteration. This should occur at the start of the Iteration, but is not before.

Daily Stand Up Meeting

This is the best opportunity during the Daily Stand Up Meeting to advise of structure impacts, or developers to request more involvement by the Database Designer in their specific task.

No system is perfect, and there will be times that the Database Model does not reflect the requirements by a coder for a task. At this time the developer is responsible to adjust their instance of the Database model to complete the coding task. The issue is; does the coder have the ability to modify the database schema when checking in? No. The responsibility for the Database Model is for the Database Designer, a difference to the normal coding practice where any coder can modify any code. The next section describes this reason in detail. At this point there is an inconsistency within the repository, and any automated testing for a build will fail. And this is what should happen. From the developers perspective their instance of the Code Ownership is correct, but for the entire iteration it is not.

One could argue this is not valid however, the cycle of Test, Code, Refactor, CheckIn while applying to the individual task, should also apply to the Database Design, however it should be at an iteration level not at a task level. To have your continuous builds breaking will however raise a lot of red flags, and I’m sure this approach will ensure that the tests which are broken as a result, will be corrected by the schema definition (the Code), there will be some refactoring if for example the Developer didn’t serve in the best interests of the bigger picture, the schema is checked in, and the continuous build bar is now green. Yipee. The bar is green, the code is clean, we can all go home now.

The Database Designer

The Database Designer should be dedicated resource in the development team for this task. For example in a team to 6-10 developers you would have 1 or 2 people. Even in a team of 2 your would have 1 person responsible. Unlike coding where any of the team pair up and work on tasks within the Iteration, the Database designer team is responsible for Database Design, like the Development Team is responsible for Coding. There are a few reasons for this:

  • They are ultimately responsible for the structure, the foundation, and this is the bigger picture that includes visibility and scope of further iterations and releases.
  • The skills are more specific.
  • They should also be a part/time developer of the team, so as to best understand the dynamics and also be part of the team.

Correspondingly, the end of the iteration should include an addition code review, that is schema related. A difference report of the schema definition at the start and end of the iteration should be compared, to ensure best practices in the larger picture as well as standards, optimisations, future performance considerations (e.g. indexes), future disk requirements (e.g. adding a new index to a 160 million GL table will take 4GB of disk space).

Other Factors

In fact as part of my upcoming conference presentation Overcoming the Challenges of Establishing Service and Support Channels I spend some time discussing Data Quality. Quite often this is the bane of support due to the complexities of software development to cater for data exceptions, or most commonly data anomalies due to historical data not meeting minimum RI and data specifications in your new system.

Ruby

Being a little despondent regarding Spring, a framework I’ve chosen to skill up in Read More, I’ve changed tack to investigate further Ruby. I was in a training demonstration of Ruby late last year, I’ve had other colleagues talk about it, and in a number of readings of late, Ruby has been making an impact, so time to delve in. I’ve got my working notes on my Wiki, and it took all of a few minutes to be operational. There appears a good wealth of reference material available including at least 2 online books. You can check out these in my Ruby References section.

Here is a comment from one of the current books I’m reading now. Beyond Java.

My partner and I decided to implement a small part of the application in Ruby On Rails, a highly productive web-based programming framework. We did this not to satisfy our customer, but to satisfy a little intellectual curiosity. The results astounded us:

  • For the rewrite, we programmed faster. Mush faster. It took Justin, my lead programmer, four nights to build what it had taken four months to build in Java (RB: They were using Hibernate,Spring and Web Work). We estimated that we were between 5 and 10 times more productive.
  • We generated one-fourth the lines of code; one-fifth if you consider configuration files
  • The productivity gains held up after we moved beyond the rewrite.
  • The Ruby on Rails version of the application performed faster. This is probably not true of all possible use cases, but for our application, the ROR active record persistence strategy trumped Hibernates’s Object Relational Mapping (ORM), at least with minimal tuning.
  • The customer cared much more about productivity then being on a safe Java Foundation. (RB: Highlighted)

As you can well imagine, this shook my world view down to the foundation.”

From Beyond Java. by Bruce. A. Tate pg 3-4.

Book Review (Part 1) – Better, Faster, Lighter Java

Well if the weight of the book has anything to do with it, it’s the lightest Java book I’ve got. Better, Faster, Lighter Java, which I got from Amazon, has been a quick read. I’ve done a quarter of the book (60 pages) in one bed-time reading. Some good information, I’ll provide a review when I’ve finished reading the book. What’s surprising that of the content that can be confirmed solely programming (i.e. the code), there were a number of errors in the book already. Here’s a summary of comments of what I’ve already sent to the publisher. (just showing the technical stuff)

Example 1-1. Counter example: implementation (pages 3-4)

Point 1:
Book: Mid page (page 3), you have public abstract Long getID();
Comment: ‘Long’ should indeed be ‘long’ with a lowercase ‘l’.

This problem also occurs on the following lines (page 3)
public abstract void setID(Long id);
public Object ejbCreate(Longong id, int count)
public void ejbPostCreate(Longong, int count)

Example 1-2. Local Interface (page 5)
Point 2:
Book: Top of Page (page 5), you have public abstract Longong getId();
Comment: ‘Long’ should indeed be ‘long’ with a lowercase ‘l’.

This problem also occurs on the following line on (page 5)
public abstract void setID(Longong);

Example 1-3. LocalHome interface (page 5)
Point3:
Book: 4th last line (page 5), and 3rd last line.
Comment: Same comment as Points 1 & 2. ‘Long’ should indeed be ‘long’ with a lowercase ‘l’.

This problem also occurs on the following line on (page 5)
public abstract void setID(Longong);

Example 1-4. Transparent counter (pages 13-14)

Point 3:
Book: On the second last line of page 13, you have private string name;
Comment: ‘string’ should indeed be ‘String’ with an uppercase ‘S’.

Point 4:
Book: On the first line of page 14, you have public void setName(long newName) {
Comment: ‘long’ should indeed be ‘String’

Point 5:
Book: On the fourth line of Page 14, you have public string getName() {
Comment: As per point 3. ‘string’ should indeed be ‘String’ with an uppercase ‘S’.

Figure 2-1 (pages 18-19)

Point 6:
Book: You list 7 points that correspond to the numbers in Figure 2-1. Point 7. Easier to Maintain
Comment: You have no point 7 in your figure.

Unreferenced Code (page 25)

Point 7:
Book: Second line of code in section. String prefix “This code is “;
Comment: You are missing a necessary assignment character = (equals) between prefix and “The…”

Unreferenced Code (2nd Example) (page 25)
Point 8:
Book: Lines result = result + “much, “; and result = result + “simpler, and neater.”;
Comment: While this is correct, it is indeed even more simpler if you replaced on both lines of result = result + with result += . It would not have really been work the mention except you are explicitly trying to demonstrate “simpler and neater”.

Unreference XML (page 32)

Point 9:
Book: In the middle of the code you have the line <include name=”**/*Test.class” />
Comment: While this is indeed valid, it would not work with you present example that you are indeed attempting to Automate with Ant. In your example, you define your JUnit Test Case class Name as TestAdder. This statement would not include the tests. It should indeed be **/Test*.class (with the third * ‘asterick) being a suffix to Test, not a prefix.

Unreference Code (page 45)

Point 10:
Book: Second line of coding section: Account valueObject;
Comment: You do indeed not use this variable, while not an error, it is unnecessary. Refer to next point for more information:

Point 11:
Book: Middle of code: account.setAccountNumber(…..), and the following line account.setBalance(….
Comment: You define no Account variable called ‘account’. So you do infact need a line of syntax: Account account; at some point. Even this however is invalid as you have not obtained an Account object, in order to use setter method setAccountNumber() and setBalance(), so you would need to have Account account = new Account(); This however is also invalid as your previous code (Page 43-44) which defines the Account class has no default constructor of no args. It is however I believe valid, as even though you don’t extend Object, you would get a define implicit Object no args constructor. I’d have to check that, but the point remains, it’s not clean sound code in regards to new Account().
And then to complete these 2 lines, setAccountNumber() is not a defined method of Account, indeed your explicitly don’t have it as part of your comments. (Page 43 Remebering your requirements, you want to keep the account number private, so you scope it accordingly, and omit the setter).

All that being said, you could do the following as an alternative to these lines.

Account account = new Account(result.getString(“accountNumber”), (float)results.getDouble(“balance”));
return account;

This could even be simplified further to simply

return new Account(result.getString(“accountNumber”), (float)results.getDouble(“balance”));

Nice and clean.

Point 12:
Comment, while on the subject of this coding example, given changes necessary, I’d make two other comments regarding this code fragement.
Firstly, you close your stmt variable with stmt.close() but you don’t close your ResultSet variable result. Good coding practice would close both, with appropiate error checking as shown with stmt.
Second, while this method is static, I would not choose to use a global Connection variable referenced as conn. This should be passed to this method as an argument.

Of course I’ve only looked at the first 1/4 of the book for some bedtime reading, and I haven’t actually taken this code and passed via a compiler, but I wanted to bring these comments to your attention. I am however enjoying the content so far of this book.

The Java Spring Framework


I’ve been reading Spring in Action as part of reskilling in Spring Framework and Hibernate. The rationale of this was, I wanted a better testing capacity of my web apps, and after some review of a number of options and input from other colleagues I went down the Spring path.

Now, Spring throws a lot of new terms at you,Aspect Oriented Programming (AOP) and Inversion of Control (IoC), and it takes some time to get into the application. A Hello World example is not a simple thing, with a number of moving parts. Still no pain, no gain. The obvious change in this development path was a significant increase in XML which started to concern me.

After some more reading and examples, I came increasing worried that I’d opened a can of worms. I choose Spring to ensure betting testing capability, but instead at this time, the verdict is out. The reason is XML. There is a lot of it, and now additional testing of this is necessary. It’s necessary to ensure consistency between XML and Classes, and most importantly, from my work to date, there is a certain amount of complexity in the XML/Class coupling. While it’s a loose coupling, the ability to test incorrect or invalid Spring XML, but valid XML beckons.

Spring Framework In some defense, springs provides an abundance of functionality and integrations with Other Web Frameworks such as JSF, Taspetry, WebWork and Structs, ORM Persistence Frameworks such as Hibernate, JDO, iBATIS, OJB, but when it comes down to it, I’m left thinking, it takes a lot more to get started in Spring, to get productive. It all seems anti KISS and anti YAGNI.

Putting aside my initial impression after about 4 weeks, my latest order of books from Amazon arrived, and while waiting for an X-Ray yesterday, I pulled out Beyond Java. It was interesting that some comments where:

  • The many frameworks designed to simplify the Java development experience do make more experienced Java developers more productive, but make the learning curve too steep for those new to Java.
  • Tremendous tools like Hivernate and Spring can let you build enterprise-strength applications with much less effort. But it can take a whole year to confidently learn how to wield these tools with skill.
  • AOP can also help, by letting you write plain old Java objects (POJOs) for your business rules, and isolated services in prepackaged aspects like security and transactions.These abstractions, though, make an ever rising river for a novice to navigate.

This all adds up to one thing, Complexity, when we should be working towards Simplicity. Why is it harder to write code, surely with all these advances it should be easier to write code, in fact why are we still writing code. You have to wonder when the next jump in the technology will occur.

Then again, we are still driving inefficient cars after 100 years when there are better and more efficient alternatives.

Handling User Requests of 'I got an error!'

Don’t you just hate that, a user at the end of the day, week, or only when you ask them say “I got an error.” Ok, well can you give me some more information. “No” is the normal answer.

I presently manage this in my applications with the following 2 references.

  • Any 500 error pages are automatically emailed, so I can see frequency, and hopefully via a stackTrace, a good indication of location.
  • For every requested URL, I log the details of date/time, user, sessionId, URI and all Parameters (GET&POST)

Given this, when an error occurs I can usually see what the person was doing via Access Log reporting. The issue is, it’s not reproducible (if it’s a POST), I can’t just retry the command, and with coding practices, I can’t just cut/paste my log output (as that would be a GET, and in the even of a POST, the code would ignore it). In addition, it doesn’t fully show what the user is actually seeing on the screen.

The problem is what I really want to see is the actual screen the user is seeing. In essence, I want to log the HTTP response of every HTTP request and attach this to my logging, therefore, for every URL requested, I can actually see the rendered HTML output. There are some minor limitations, of post processing Javascript for example (an AJAX application would not really work here), and I’d definitely want to catch the response from the application server and not on the client (who knows what additional work a browser might do to your code).

So the delimma, is how can I achieve this currently using Apache Tomcat. I’m assuming a generic option would not present itself independent of application servers.

My search continues.

What do you want in a job?

Well it’s a good question, as part of sorting out tax documentation I came across the following list (go figure why it was with tax stuff).
I made this list sometime mid 2002, I remember specifically the job I made the list for to evaluate. I’ve yet to take adequate time to completely review each point carefully and revise, however it’s a good starting point to look back at what you thought previously.

This list is not a list of what I want in a job, more a checklist of what I would like, and at the end I can get a score of how well the job fits to these items.

  1. Use of Existing Core Skills – Technologies
    • Oracle, Unix, Java, HTML/Web
    • Open Source Projects (Apache – HTTP/Java/XML, My SQL, PHP, etc)
  2. Use of Existing Core Skills – Experience
    • Database Modelling
    • Large Systems Design & Development
    • Application Performance Tuning
    • Technical Problem Solving
  3. About Newer Technologies
    • Encourage/Embrace use
    • Not bleeding edge
    • Opportunity for learning new relative skills
  4. Work Environment
    • Equipment
    • Location
    • Core working hours
    • Work at home options
  5. Team Environment
    • Ability to undertaking varying roles (and not undertake others due to other team members)
    • Existing working relationship with people
    • Across Development/Management/Support Teams
  6. Project Environment
    • Durations – 1 month – 1 year
    • Challenging
    • Varying
    • Problem Solving
  7. Project Methodologies
    • Emphasis on Quality Procedures
    • Emphasis on Software Testing
    • Emphasis on Customer
  8. Renumeration
  9. Other
    • Some Fun

Linux Format Reader Awards 2006

The Linux Format magazine is having it’s annual reader awards in a number of categories.

These include (I’ve include my picks after each category):

You can nominate at http://www.linuxformat.co.uk/nominate/. Nomimations close Friday 10 Feb 2006.

What's in a Wiki?

I’ve been wondering what exactly is a wiki? It’s also not the first time I’ve been asked myself. Here is an explaination I came across.

A wiki, from the Hawaiian term for “quick,” is an ongoing, ever-evolving, organized compilation of information.

The theory behind wikis is simple: One know-it-all is not enough; more is better. Unlike other Web sites created and managed by a single person or entity, wikis are truly for the people, by the people. But the model has also come under fire, underscoring the danger of wikis: it’s open to everyone, including pranksters and those with an ax to grind or want to revise history in their favor.

Wiki’s enable a mass collaboration to build all kinds of things. It is becoming a new model for doing things on the Internet. The software is not magic, you can’t throw up a wiki and hope a miracle will occur. It takes thoughtful people to make it run.

Wikimania is growing dramatically. In October, 16.3 million people visited Wikipedia, a 267 percent increase from the same month last year, according to Nielsen/NetRatings. Overall, the site has compiled 2.6 million articles, including 840,000 in English.

Windoze and Microspew at it again.

In reading the Google News Sci/Tech headlines today, near the top was the following article.

Windows Security Flaw Is ‘Severe’ Washington Post – 36 minutes ago
A previously unknown flaw in Microsoft Corp.’s Windows operating system is leaving computer users vulnerable to spyware, viruses and other programs that could overtake their machines and has …

So I’m going to ask two stupid questions.

  1. When is a Windows Security flaw not serious and will not leave computers vulnerable, blah blah blah?
  2. When are these articles going to end the monopoly of the desktop and start actively promoting viable open source linux alternatives every time these articles are written.

Makes you wonder about the media?

Adding to the Library Collection

I took the chance today to order some books from Amazon today to add to the library. Of course I’m still reading 2 current books Spring in Action and the MySQL Certification Study Guide in order to site the second MySQL Professional Certification Exam.

As with most things, you start off looking or reading on the web for something and you end up completely somewhere else. In this case, it was looking at Linux Software Labs (Australia) at the price of their Linux Distribution CD’s, which lead me to the book Beyond Java listed on their site. Called my local computer book store, but not being open (Boxing Day Public Holiday), lead me to go, “well I’ve been meaning to order some books from Amazon”, what were they again. So this lead me to coming up with a whole new list, and I figured for the cost of freight to Australia, I may as well order a few. So here is what I got.

Better, Faster, Lighter Java, MySQL (3rd Edition) (Developer’s Library) , High Performance MySQL,and of course Beyond Java.

The hard part now being the waiting 6-10 days.

Promoting a new Feed Icon

I found this site which is promoting the use of a unified feed icon. Firefox started it, IE is onboard, it’s time for us to back up around it. Spread the joy.

Feed Icons: A new standard is being established to identify syndicated content. You can help by adding this icon to your site. What’s it all about

December Java Users Group talk on AJAX

I attended the December meeting of the Brisbane Java Users Group last night. The presenters Alex and Brad from Working Mouse a Brisbane Based J2EE Solutions Provider gave a talk on AJAX.

What is AJAX? It stands for “Asynchronous Javascript and XML”. While the name has stuck, it both does not require Asynchronous communication, nor need to use XML, at least the Javascript part stays. AJAX is also not a new language or technology, merely a collection of technologies grouped together to provide a given function, which is to provide rich feature in page functionality within a web browser. The presentation centered around DWR – Direct Web Remoting implementation. There are in fact a number around in various server languages.

Let me explain some more, providing dynamic content on a website is straight forward, when you request a page, however to provide dynamic content within a page without refreshing the page (and in turn keeping all page state) is not a feature of the HTTP protocol. The most obvious case always presented is when selecting a Country Select Box value, a Select Box of States is populated based on the selection without the user seeing both the entire page reloading and waiting for this. There are of course a number of examples of use.

AJAX isn’t new, infact the underlying requirements within AJAX, the DHTML, DOM manipulation and XMLHttpRequest were available in 1997 (as mentioned in the presentation by Brad). In fact, I implemented functionality to perform what AJAX does back in the late 90’s, probably starting 1999, using solely Javascript, and some of that is still in use today on at least one of my sites. Of course Google made this functionality popular with it’s use in Google Suggest a few years ago.

While the presentation was a good introduction to those that had not seen this in operation, the subsequent discussions over dinner prompted some strong reactions, which is good in our line of work.

This technology implementation is inherently flawed, primarily due to the reliance on a Web Browser, and being both a multitude of available browsers across platforms and more specifically a lack of standards adoption causes this technology simply not to be available for all users. Of course Microsoft Internet Explorer is a significant pain in the butt here, as it’s simply not standards compliant, and you are forced to write bad code to work in IE simply due to it’s market penetration. There are of course a lot more of concern, proxies at multiple levels of interaction can drive you mad, and the increases in bandwidth and server performance.

That aside, the issue of needing to provide this level of rich content within a browser is another very good case. This is driven by end user need, and ultimately it is rather ridiculous it’s complicated code, it’s yet another language within the application to support, and the support is difficult, it’s even more complicated to provide some type of automated testing. But I guess the strongest comments came from Max, who recognised me after 15 years. Max was a lecturer in my undergraduate studies from 87-89, a long time ago. I would place Max (not his real name by the way, it’s a long story which took some research at the time), as one of the top three lecturers in my studies that influenced my path to where I am today.

His points were totally valid, why oh why are we doing this, it’s just ridiculous this level of complexity, to do what a browser was not simply designed to do. I would tend to agree, we are forced again by the influence of Microsoft technologies on end users to provide a level of experience they have been brainwashed into. It so reminds me of the The Matrix movie, where everybody is living under the power of the machines (Microsoft), and there a small few fighting a rebel cause to show them what the picture really looks like.

XP Group in Brisbane

Brisbane has another XP Group. Just found out about it. Info can be found at http://groups.google.com/group/Brisbane-XP. I’ve been involved in some part in 2 previous groups in Brisbane.

I’m thinking about some ideas myself, I’ve got all the XP skills, however I’m now skilling up in Spring (a full-stack Java/J2EE application framework) and Hibernate (a powerful, ultra-high performance object/relational persistence and query service for Java). And I’ve got 2 other friends in similar positions.

Wouldn’t it be great if for 6 to 8 weeks, a few hours a week we could work on a project honning traditional XP as well as having some experience people in technologies helping others. Of course in comes back to some giving all to others, but I’m sure it doesn’t have to be that way.

Speaking at MySQL Users Group

I’m preparing to speak at the next MySQL Brisbane Users Group in Febraury 2006. My topic will be Know your competitor – A MySQL Developers Guide to Using Oracle Express Edition. You can get a full copy of my presentation slides at my Articles Page.

Having a strong background in Oracle, and having been using MySQL for the past 5 years, the release of Oracle Database 10g Express Edition (XE) as a Free offering (with limitations of 1 CPU, 1GB Ram, and 4GB disk) is an interesting move by Oracle.

I’ve written a number of recent comments on various Oracle/MySQL things including Responses to some Oracle v’s MySQL Questions, How can Oracle 10g Express Edition target MySQL?, Oracle 10g Express Edition Target Audience. Is it MySQL?, Oracle 10g Express, Free v’s Open Source and OFA.

The question could be posed, what relevence does this have to MySQL developers? Well, in some respects very little, but in others, both knowing about your competitor a little more, being able to see their offering, and in particular in comparision to MySQL can help in a level of understanding in Database differences. I am hoping that from the discussions, people will consider some approaches to design and development that is more “database compatible”, regardless of which database.

Will Oracle 10g Express Edition take off, well difficult question, there are many target markets, will it compete with MySQL in Open Source, hopefully my talk will sporn some discussion of peoples experiences in the various organisations and businesses represented in our meetings.

Upcoming Open Source Conference Presentation

I’ve been working recently on a paper I’m presenting to a conference in February 2006 titled Implementing Open Source for Optimal Business Performance. I’ve got the final glossy brochure yesterday so I now have something to show everybody. View Here (Be warned it’s a little bright)

The topic I have been asked to speak on is Overcoming the Challenges of Establishing Service and Support Channels. My notes are still in the early stage, but are available at http://wiki.arabx.com.au/index.php/ARABX_Articles:Overcoming_Service_and_Support_Channels

Conference Details on Ark Group Web Site

Review of Database Magazine Article – "The Usual Suspects"

In the “Australian Technology and Business Magazine” – December 2005 edition there was an article on comparing database products. Here are my comments, which I also plan to forward to the editor.

BTW: I’ve since also found this articles content on another site here. It seems that most if not all is the same.

In response to your cover story article “The Usual Suspects Four databases we suspect your business could be quite interested in.” which appeared in the December 2005 edition, I would have to sum up your article in one word “Disappointing”. Let me provide some feedback from my perspective.

You start by defining a scenario, which is the only approach you can take for a suitable comparision of database products due to diversity of features available in todays products. A good start, necessary to limit the discussion of features and functionality. However, you then specify some additional business requirements, for example “relatively small e-commerce” and “cost of the initial server and database software is certainly an issue.” Now, having worked for a number of small internet and e-commerce companies, you don’t have the budget for a Dell Quad Xeon processor machine, nor then the requirements for co-located hosting or dedicate networking bandwidth for your fancy new hardware as well as the additional staffing support costs. So immediately your scenario is more unrealistic.

The major sticking point I have is your 4 processor requirement. The most efficient and cost effective initial implementation is to lease dedicated servers, there are numerous reasons including cost savings, better hardware support, larger bandwidth capacity and easy growth path to start. You can also easily monitor growth and more quickly change needs then having a large initial hardware cost. I could continue regarding hosting, however this alone changes the requirements to using single or dual processor machines given your scenario. With this in mind the playing field is now completely different but a better reflection of the scenario. Your argument for “scale up to a small server farm”, also does not hold, because you can easily get economy of scale in splitting application server and database server, splitting OLTP and batch database requirements and other common practices, not to mention additional benefits such as redundancy.

My final comments on your hardware, specifically in relation to MySQL (including using Version 3), you can get significant performance from hardware given your small size requirements and even with modest growth on single and dual processor equipment. Other then opening remarks your article makes no further references to performance requirements or indeed any level of performance analysis of the products reviewed in this article.

You make scant reference to other database products, mentioning only one ‘Sybase’ in half a sentence in your opening and once again. In 8 pages, surely rounding the article to give a clearer perspective of the marketplace with even one paragraph to mention that there are many different database products both commercial and open source that service differing business needs. Other major products not compared at this time include Sybase, Informix, Ingres, PostgreSQL, MaxDB and Berley DB as well as many more.

Your choice of products is also not consistent or reflective of your scenerio. I’ll provide a few specific reasons. Firstly, you compare beta products against production products, if your criteria was current production products you should have compared using MS SQL Server 2000, however that would clearly provide a poor reflection in Microsoft due to it’s clearly dated product. If you allowed one beta product, why then did you not use MySQL 5.0 beta which was available at the time. While you have taken the effort to adjust your article to include references to MySQL 5.0, and you in turn choose MySQL as your editor’s choice, you should have been consistent throughout the article as you give mixed comparising referencing two versions of one product. Futher to this, you choose to use a dated Oracle product in 10g Release 1. 10g Release 2 has been available for a number of months. I would also question your decision to choose the more expensive Standard Edition over Standard Edition One, but this again could be soley due to your overspecd hardware.

If your rationale for including beta was cost based, then you did Oracle a clear injustice. You make again, only a half sentence reference to Oracle’s new released free product in the opening section. You mix more recent MySQL 5.0 information within your review of MySQL 4.1.14, yet you mention nothing of Oracle 10g Express Edition, for example it’s a free product much like Microsoft SQL Server Express, but also has similar limitation in 1 CPU, 1GB RAM and 4GB of disk but all the power and functionality of other Oracle Products, as well as default inclusion of web based administration tools with HTMLDB.

Your quick product summary (4 columns of information) suffers from a number of already mentioned points, however in relation to the only commercial product with a free offering, MS SQL Server Express, you clearly gloss over the limitations. 1 CPU, 1GB Ram and 4GB of disk is critical information, this should have been included in the product summary, you only go part the way. Regarding MySQL, should have clearly stated reference to $0 under GPL license. On that note, and mentioned in your detailed review, there are limitations in the distribution of MySQL within a commercial product and this is not in your summary.

Your article places no emphasis on performance or efficiency. Given your need to mention your testing on quad processor hardware, you make references to various limits of CPU across products, memory and hardware requirements as well as some generic maximum sizings, but nothing on performance, throughput and then growth potential, as this was part of your opening scenario.

It’s not possible to clearly date when this review of products was performed, granted the marketplace has changed rapidly in recent months, the fact that your article references Oracle 10g Express edition clearly includes changes were possible to the article in early November.

In an 8 page article, as mentioned you could have allocated one column to mention that the Database marketplace contains many more products. In particular considering you have included an Open Source product, and you selected this as product of choice, I feel this gives even more justification to at least giving credit to the emerging Open Source Database market. You actually place I recall only one mention to “Open Source” which is signifcant in the context of your choice. Other products would include PostgreSQL, Berkley DB, Apache Derby and even Ingres. While your article should clearly not need to analyse these at this time, by leading into this topic you provide clear opportunity for further discussion.

At the end of the day, while you provided a concise one page breakdown of features and certain limits, this technical information does not provide a clear benefit to an IT manager, or even a technical person.

Web 2.0 Design Patterns

In his book, “A Pattern Language”, Christopher Alexander prescribes a format for the concise description of the solution to architectural problems. He writes: “Each pattern describes a problem that occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice.”

1. The Long Tail
Small sites make up the bulk of the internet’s content; narrow niches make up the bulk of internet’s the possible applications. Therefore: Leverage customer-self service and algorithmic data management to reach out to the entire web, to the edges and not just the center, to the long tail and not just the head.
2. Data is the Next Intel Inside
Applications are increasingly data-driven. Therefore: For competitive advantage, seek to own a unique, hard-to-recreate source of data.
3. Users Add Value
The key to competitive advantage in internet applications is the extent to which users add their own data to that which you provide. Therefore: Don’t restrict your “architecture of participation” to software development. Involve your users both implicitly and explicitly in adding value to your application.
4. Network Effects by Default
Only a small percentage of users will go to the trouble of adding value to your application. Therefore: Set inclusive defaults for aggregating user data as a side-effect of their use of the application.
5. Some Rights Reserved.
Intellectual property protection limits re-use and prevents experimentation. Therefore: When benefits come from collective adoption, not private restriction, make sure that barriers to adoption are low. Follow existing standards, and use licenses with as few restrictions as possible. Design for “hackability” and “remixability.”
6. The Perpetual Beta
When devices and programs are connected to the internet, applications are no longer software artifacts, they are ongoing services. Therefore: Don’t package up new features into monolithic releases, but instead add them on a regular basis as part of the normal user experience. Engage your users as real-time testers, and instrument the service so that you know how people use the new features.
7. Cooperate, Don’t Control
Web 2.0 applications are built of a network of cooperating data services. Therefore: Offer web services interfaces and content syndication, and re-use the data services of others. Support lightweight programming models that allow for loosely-coupled systems.
8. Software Above the Level of a Single Device
The PC is no longer the only access device for internet applications, and applications that are limited to a single device are less valuable than those that are connected. Therefore: Design your application from the get-go to integrate services across handheld devices, PCs, and internet servers.

What Is Web 2.0?

In his article What Is Web 2.0 – Design Patterns and Business Models for the Next Generation of Software Tim O’Reilly gives a very detailed description of these seven principles.

1. The Web As Platform
2. Harnessing Collective Intelligence
3. Data is the Next Intel Inside
4. End of the Software Release Cycle
5. Lightweight Programming Models
6. Software Above the Level of a Single Device
7. Rich User Experiences

Core Competencies of Web 2.0 Companies

In exploring the seven principles above, we’ve highlighted some of the principal features of Web 2.0. Each of the examples we’ve explored demonstrates one or more of those key principles, but may miss others. Let’s close, therefore, by summarizing what we believe to be the core competencies of Web 2.0 companies:

* Services, not packaged software, with cost-effective scalability
* Control over unique, hard-to-recreate data sources that get richer as more people use them
* Trusting users as co-developers
* Harnessing collective intelligence
* Leveraging the long tail through customer self-service
* Software above the level of a single device
* Lightweight user interfaces, development models, AND business models

The next time a company claims that it’s “Web 2.0,” test their features against the list above. The more points they score, the more they are worthy of the name. Remember, though, that excellence in one area may be more telling than some small steps in all seven.

Some of the information provided is very interesting, I will be waiting with interest to see if this term “Web 2.0″ becomes something, or not.

Myths Open Source Developers Tell Ourselves

Some interesting points from this ONLamp article on Myths Open Source Developers Tell Ourselves

Publishing your Code Will Attract Many Skilled and Frequent Contributors
Myth: Publicly releasing open source code will attract flurries of patches and new contributors.
Reality: You’ll be lucky to hear from people merely using your code, much less those interested in modifying it.

Feature Freezes Help Stability
Myth: Stopping new development for weeks or months to fix bugs is the best way to produce stable, polished software.
Reality: Stopping new development for awhile to find and fix unknown bugs is fine. That’s only a part of writing good software.

The Best Way to Learn a Project is to Fix its Bugs and Read its Code
Myth: New developers interested in the project will best learn the project by fixing bugs and reading the source code.
Reality: Reading code is difficult. Fixing bugs is difficult and probably something you don’t want to do anyway. While giving someone unglamorous work is a good way to test his dedication, it relies on unstructured learning by osmosis.

Packaging Doesn’t Matter
Myth: Installation and configuration aren’t as important as making the source available.
Reality: If it takes too much work just to get the software working, many people will silently quit.

It’s Better to Start from Scratch
Myth: Bad or unappealing code or projects should be thrown away completely.
Reality: Solving the same simple problems again and again wastes time that could be applied to solving new, larger problems.

Programs Suck; Frameworks Rule!
Myth: It’s better to provide a framework for lots of people to solve lots of problems than to solve only one problem well.
Reality: It’s really hard to write a good framework unless you’re already using it to solve at least one real problem.

I’ll Do it Right *This* Time
Myth: Even though your previous code was buggy, undocumented, hard to maintain, or slow, your next attempt will be perfect.
Reality: If you weren’t disciplined then, why would you be disciplined now?

Warnings Are OK
Myth: Warnings are just warnings. They’re not errors and no one really cares about them.
Reality: Warnings can hide real problems, especially if you get used to them.

End Users Love Tracking CVS
Myth: Users don’t mind upgrading to the latest version from CVS for a bugfix or a long-awaited feature.
Reality: If it’s difficult for you to provide important bugfixes for previous releases, your CVS tree probably isn’t very stable.

Web 2.0. Not to be confused with Internet2

What is Web 2.0? Well the definitions out there aren’t clear and precise. Tim O’Reilly from O’Reilly Publishing has a detailed description at http://www.oreillynet.com/lpt/a/6228. (More notes from this below) His compact description is:

“Web 2.0 is the network as platform, spanning all connected devices; Web 2.0 applications are those that make the most of the intrinsic advantages of that platform: delivering software as a continually-updated service that gets better the more people use it, consuming and remixing data from multiple sources, including individual users, while providing their own data and services in a form that allows remixing by others, creating network effects through an “architecture of participation,” and going beyond the page metaphor of Web 1.0 to deliver rich user experiences.”

The Web 2 Conference (www.web2con.com) with the theme “Revving the Web” has some interesting content on the site.

Let me be clear, I didn’t know what Web 2.0 was 2 hours ago, I stumbled across an article Web 2.0 Principles applied by Yellowikis while research IT Outsourcing jobs in India/China etc (go figure). Anyway, the following summation prompted me to read about this topic a little more.

* Web-based (of course) and uses wiki technology; the same MediaWiki software that powers Wikipedia.
* Any user can both read and write content – adding business listings and editing them. To put it in ‘Web 2.0 wanker’ terms, it harnesses collective intelligence.
* Requires a significant amount of ‘trust’ in the users.
* Can be deployed via the Web in countries all over the world (see Emily Chang’s interview with Paul Youlten for more details on this aspect).
* Developed and is maintained by a small team (just Paul and his 14-year old daughter – both working part-time).
* Has fast, lightweight and inexpensive development cycles.
* Uses Open Source LAMP technologies (Linux, Apache, MySQL and PHP) – meaning it is very cheap to run.
* The content has no copyright and is freely licensed under the GNU Free Documentation License 1.2.
* Can and will hook into other Web systems, e.g. Google Maps. Indeed if it introduces its own APIs, then it will be able to be remixed by other developers.
* Relies on word-of-mouth and other ‘viral’ marketing.
* Requires network effects to kick in order to be successful (at least at the scale of disrupting the Yellow Pages industry).
* Yellowikis will get better the more people use it. The Wikipedia is an excellent example of this.

Taking a few lines from Tim O’Reillys detailed description as a quick taste for you to read more.

  • Wikipedia, an online encyclopedia based on the unlikely notion that an entry can be added by any web user, and edited by any other, is a radical experiment in trust, applying Eric Raymond’s dictum (originally coined in the context of open source software) that “with enough eyeballs, all bugs are shallow,” to content creation. Wikipedia is already in the top 100 websites, and many think it will be in the top ten before long. This is a profound change in the dynamics of content creation!
  • It is a truism that the greatest internet success stories don’t advertise their products. Their adoption is driven by “viral marketing”–that is, recommendations propagating directly from one user to another. You can almost make the case that if a site or product relies on advertising to get the word out, it isn’t Web 2.0.
  • 4. End of the Software Release Cycle – As noted above in the discussion of Google vs. Netscape, one of the defining characteristics of internet era software is that it is delivered as a service, not as a product. This fact leads to a number of fundamental changes in the business model of such a company:
  • One of the key lessons of the Web 2.0 era is this: Users add value. But only a small percentage of users will go to the trouble of adding value to your application via explicit means. Therefore, Web 2.0 companies set inclusive defaults for aggregating user data and building value as a side-effect of ordinary use of the application. As noted above, they build systems that get better the more people use them.
  • Contrast, however, the position of Amazon.com. Like competitors such as Barnesandnoble.com, its original database came from ISBN registry provider R.R. Bowker. But unlike MapQuest, Amazon relentlessly enhanced the data, adding publisher-supplied data such as cover images, table of contents, index, and sample material. Even more importantly, they harnessed their users to annotate the data, such that after ten years, Amazon, not Bowker, is the primary source for bibliographic data on books, a reference source for scholars and librarians as well as consumers. Amazon also introduced their own proprietary identifier, the ASIN, which corresponds to the ISBN where one is present, and creates an equivalent namespace for products without one. Effectively, Amazon “embraced and extended” their data suppliers.
  • Users must be treated as co-developers, in a reflection of open source development practices (even if the software in question is unlikely to be released under an open source license.) The open source dictum, “release early and release often” in fact has morphed into an even more radical position, “the perpetual beta,” in which the product is developed in the open, with new features slipstreamed in on a monthly, weekly, or even daily basis. It’s no accident that services such as Gmail, Google Maps, Flickr, del.icio.us, and the like may be expected to bear a “Beta” logo for years at a time.
  • …Support lightweight programming models that allow for loosely coupled systems…. …Think syndication, not coordination… … Design for “hackability” and remixability…

I could go on.

Other References
ZDNET Web2Con