The agile software development lifecycle responsibility

July 6, 2016 by ronald

The eXtreme Programming (XP) methodology places emphasis on a number of core principles for agile software development. These include (and are not limited to) the planning game, short and frequent iterations, testing, frequent refactoring, continuous integration, ownership and standards.

Identifying the problem

These core principles however are not the full lifecycle of software development. This is really only a portion of the lifecycle. What is lacking is the definition for the ongoing responsibility and ownership by the creators of software in the sustainability of said software for the lifetime of use and benefit to an organization.

An agile methodology approach (of which XP is just one) fails to expose and describe the full operational cost within development, testing and deployment. Just as a single line of code is viewed a hundred times more than the time is was written, the usage of that code in the full lifecycle of an organization is potentially a magnitude more investment of time and resources.

Software development is not just about new feature creation. It is also ensuring full product ownership and responsibility consistently. It is also ensuring that in a larger organization, compatibility and consistency can occur with other products. In other words, it is thinking of software for the whole organization, rather than the sum of individual parts.

Scheduling lifecycle management time

Development and engineering resources already apportion time between planning, development and unit testing. There needs to be a second more important consideration. An apportionment of time between product features, product stability and product maintainability.

A good assignment of time to cover the full lifecycle adequately is:

60% of time to feature design, development and product support (i.e. bugs)
20% of time in stability and sustainability management of the existing technology stack (i.e. refactoring and testing)
20% of time in overall lifecycle management of delivered functionality (i.e. ongoing ownership)

Conveniently this Pareto allocation can be seen as 80% for development time and 20% for time generally considered operations.

Sustainability Management

Remember the core principles of XP that included frequent refactoring and standards. How much time is spent on refactoring code to provide a better, more consistent, more testable codebase for an application after code is initially deployed? What about across multiple applications in your organization. Engineering resources rarely invest any time let alone actively scheduling time for code maintenance by the entire engineering organization, yet there are immediate benefits. It can be amazing how more performant a system is when unnecessary code is simply deleted from software that has gradually evolved over time. The compounding benefits can mean less code to view by developers and thus adding incremental efficiency. Less code to deploy also means smaller installation and application footprint. Particularly when the code is unnecessarily executed in the common usage path.

Engineering teams in general are more focussed on delivering new functionality or fixing issues with newer functionality rather than reviewing existing functionality for optimization, consolidation, replacement or removal. What about applying an improvement to not just one application, but multiple applications across an organization whenever possible.

There is generally at least one individual at each organization that has the attitude of “Do I write the line of code, or is there a better way?”, and “What code can be deleted as it is no longer (or was never) used?”. If all engineers considered, evaluated and implemented these concepts as a daily process, code would be more stable, it would be more lean. Does your organization have a recognition for the developer that has deleted the most lines of code from your production system?

The following is the example of a single developers improvement to a production system via deletions.
github deletions

Are there better ways of implementing functionality with the version of the technology stack already in use? Many times a newer version of software is used for one feature, but what other new or improved features also exist. This is a proactive measure to look at the features of the technology in use. This is a different type of refactoring, but the same concept in code reduction. A great example here is the use of an iterator design pattern rather than a loop. In initial deployment of an application, memory optimization may not have been obvious, however over time and increasing datasets this simple proactive action has a larger benefit for the application.

A final step in improving sustainability of the software is testing. An agile approach introduces unit testing, but testing do not stop with the validation of a single line of code. Testing encompasses how that functions with the entire system, often known as functional testing. Systems often require load testing to know the capacity before failure, not after it occurs. If as much time was spent in these two additional areas of testing, as was spent in unit testing, more robust systems would exist and the unseen benefit is the productivity to spent more time developing.

Here are a few customer examples of refactoring. Unfortunately this is an all to common occurrence.

Module bloat

An assessment of the technology stack for a newly deployed application (i.e. just a few weeks old) showed a long list of PHP and Apache modules. Without any justification as to why these modules were used, and without a willing engineering sponsor it took quite some time to first produce automated deployment duplicating this custom environment, than applicable testing to strip out what was ultimately unnecessary. The overall outcome had multiple effects. What was needed to operate the system was actually documented. What was needed was actually automated to assist in future deployments. The resulting software was more performant as it had less baggage. The resulting deployed VM image was actually over 1GB smaller after all bloat was removed. This improved the time to deploy new application servers. As this system had a very large scale up and scale down weekly, we are talking 1000% at peak times, the impact of a more lean stack had a huge impact on the true deployment times of the application. This is an attribute that can be difficult for developers to appreciate, when comparing a development environment to a production system.

This entire process and the large investment of work would have been almost non-existent if this was part of the engineering methodology used during initial development (which took over one year for initial deployment), and if more (or all) individual developers stopped to ask why are we adding additional modules. This is part of the infrastructure planning that should have a feedback loop within each iteration. This also requires both a solid experience in engineering and architectural oversight to be able to estimate the impact over a much larger time period than the development cycle.

Framework bloat

An education based client faced a huge problem. The existing system had grown over a number of years, the engineering department had grown from one developer to over a dozen developers, yet the approach towards software development had not changed from that single developer original module based Drupal approach for a small application. With sales for the next annual education cycle already 4x more than the current user base that was having regular outages, the system could not (and would not) sustain known future sales.

Often the first question asked by clients in this situation when offering performance services is “How can I scale my system 10x?” I generally counter this question with “How did you scale from when your system was 10x smaller to now?”. Aside from the interesting conversations around these responses, I often need to explain that performance is about efficiency, and this often requires a cultural change. I also generally quote one of my popular lines — “When reviewing the performance of a piece of code (or SQL statement); the first objective should not be to make it better; the first objective should be to eliminate it.” This is also generally received with blank stares and silence. Efficiency it seems perhaps is no longer taught or practiced.

As with most simple yet profound assessments an example of the clients production system can best demonstrate what inefficiency is. An analysis of the user registration process unveiled alarming result. This analysis that can happen in a very short period, e.g. an hour. In summary, 50 SQL statements were executed to register a new user to the system. A physical desk check (again foreign when you have to ask multiple people how do I print out something as a visiting consultant) of just the database access showed that with the present inefficient Drupal ‘node’ schema design, just 11 SQL statements were actually needed to complete the required task. That is, the code could be 500% more efficient and nothing has been tuned or scaled. The client needed at least a 400% immediate improvement. However, just explaining this did not convince the organizations c-level executives to reset poor development practices to addressing immediate and ongoing scalability (i.e. success of your startup). They wanted a more abstract approach, they wanted a magically sharded solution were simply throwing H/W (and $) at the problem made it go away without changing the engineering mindset. If you go back to the answer to my response question you find this is often the solution to get to the current point, that is add more servers, add caching, add read-only data access. This is not actually the solution but is adding complexity to the problem and making it more expensive to correct. In the startup ecosystem this is also known as a successful catastrophe. You reached all of your marketing and sales pitch goals, and your software crumpled under your unplanned success.

Was this problem just in user registration, or was it throughout the entire application? If looking at one common and frequent code path a 500% improvement can be made with 0% feature impact. Would that not indicate the problem exists elsewhere in the codebase. In fact, this example product was not even the classic RAT v CAT that is often a more compounding performance issue.

Further assessment of this one code path demonstrated that when an optimal schema design was architected for the purpose of the application, the number of SQL statements would be reduced to 5 (i.e. a 900% improvement). This is a significant performance and scalability benefit when using applicable architectural design and strategic planning. Performing regular architectural reviews by skilled resources in your business strategy can help to address development productivity regression long before they occur. A great architect never sees the true benefits of their work. It is a silent reward that their given experience, knowledge and expertise has an unknown financial value to an organization.

Lifecycle Management

It can be difficult to understand the impact of code in the full lifecycle of a software product in the 21^st century. Until individuals have seen the birth, growth, support, longevity and death of a system it can be impossible to understand the impact some lines of code have with one application and the interoperability requirements with other applications. When the waterfall approach for SDLC was still in active use this was possible with large scale projects over time. In the post tech boom age and with the use of agile methodologies the incremental development lifecycle hides a lot of important context for better assessment of true cost savings.

The introduction and increasing popularity of the devops and site reliability roles also attempts to hide what many large organizations and successful website have, that is a dedicated operations team. Tools have done so much to enable engineers to be more productive. Automated provisioning, PaaS and CI/CD tools seamlessly enable more (abstract) code to be written to provide that essential functionality to the end user. Automated testing has replaced design documents. Organizations developer systems without is a data model? All of these tools and techniques however do not replace the intelligence needed to operate a system over time, particularly for tasks including upgrades and integrations.

One simple concept can be implemented to assist in all contributors owning lifecycle management.

The first is the responsibility of a developer being paged when a production problem occurs due to the line of code they wrote. Being responsible accepting that in the early morning or weekend you may be needed to address a problem attributed to your individual work and a failure within an entire system may make the decision to consider the larger impact more prevalent. This is taking the XP principle of ownership and defining the time dimension to a period infinitely greater than the present iteration.

The following is a great tweet that shows this developer has heard of commenting their code, but not considering lifecycle management?

// When I wrote this, only God and I understood what I was doing
// Now, God only knows

Justifying the reallocation of time

In the 1990s the concept of adding a quality step to software development via means of code reviews and automated testing was seen as an impediment to productivity. This potential cost in lost productivity could not be justified. Why would developers write tests when there is an entire QA team to test new features each time the software is released? Today it is seen as an essential component for continuous integration and delivery and the testing is designed to test all functionality repeatedly, not just new functionality.

Assigning 40% of present development time elsewhere could be viewed as a loss of productivity because today projects do not have a start and end date and deliverables where a total cost of ownership could be more clearly calculated. Today, projects are a continual ongoing evolution, even the concept of cost projection simply does not exist and therefore could be stated as impossible to validate against. After more than a decade of working with startups at many stages of evolution, the cost of not undertaking stability and lifecycle management is a far greater longer term cost to an organization by an outside observer. Look no further than the much larger turnover of technology staff in today’s organizations. These resources have institutional knowledge that is lost to the organization. This information is rarely documented as a historical artifact and the reason why steps were taken cannot be inferred from what is presently the state of the current code (or even reviewing the code revision history). This cost is rarely calculated within the software development lifecycle.

Adopting ownership

Many organizations suffer from the clash of traditional infrastructure principles with the pace of accelerated innovation. This approach helps to better balance the responsibility particularly between engineering and operations departments and improves the workflow to producing better products to the business in the longer term and ultimately to those who matter, the customer.

When developers value the total impact of a line of code in the full lifecycle of the product or service, a different mindset leads to actually writing better code. This code results in being more efficient and the carryover effect is the developer is actually more effective at writing more subsequent code.

Testability

October 2, 2009 by ronald

If I was to provide one tip for organizations on how to implement a successful technology solution, I would state you need to ensure your product/software/system is completely testable. Independent on how you elect to test your system, the design of creating a completely testable infrastructure will enable exponential savings as your business grows.

You achieve this by implementing an Application Programming Interface (API) for all data access. Your goal should be to move away from technology dependence and towards a technology agnostic solution, your dependency is now your business specification. This does not mean you are going to expose this API to the Internet, your own applications are your first clients, your web site and your management reporting tools. Your website is just a client presentation of your most valuable asset, your information.

Creating an environment that enables you test and verify your information independently from how is renders in a browser, enables a complete level of possible automation for testing this component of your communication channel. While end to end testing is also necessary, this becomes more complex and is impractical if this is your only means of testing. The principle of any popular Agile methodology approach is around testing where one popular term is Test Driven Development (TDD). While you may not implement TDD, knowing and applying the principals enables testability.

As you continue to grow, you will realize you now have the infrastructure and ability to stress test your most important system features. It is a common misconception that testing is about ensuring your software works as designed. Testing should not be about what works, but what doesn’t break. The goal of testing should be to break your software. The ability to stress test your system is to know when your system will fail. This ability to predict can benefit you ahead of time. You do not want your startup to suffer a successful catastrophe where you meet all your marketing goals, but you system crashes, and while the “Twitter failed whale” is frustrating, this is one approach attempt to mediate a total failure.

A summary introduction to Agile

October 9, 2006 by ronald

Agile Development Methodology: – Most popular Implementations: Extreme Programming (XP), SCRUM, Crystal

Books Highly Recommended

Extreme Programming Explained Extreme Programming Pocket Book More books on my Library page.

Guidelines for managing embedded external project dependencies

July 2, 2006 by ronald

I’ve yet to find any Java project that doesn’t have dependancies on some other Open Source external libraries. I’ve yet to find a Java project that manages these external dependencies appropiately for support and integration at an enterprise level.

As with most projects, understanding an applying sound principles that scale will help you at a later date, and generally the cost of implementation is minimual at the start, but of course becomes more expensive when it’s really needed. The classic case is Version Control. For over 10 years, even on small single developer projects, I’ve used Version Control, it should be taught at university as an introduction to good programming design, it would greatly benefit software development and maintenance.

Back onto the topic of hand. Let’s use a moderate Java Web Based application, and for the purposes of this discussion the following Open Source external libraries are incoporated. Log4J, JUnit, Canoo WebTest, MySQL JDBC, Apache Commons (Collections, DHCP, Pool, HTTPClient, Taglibs Mailer). I could continue, but this will suffice for the demonstration.

It’s very easy for your project to include the appropiate jar’s such as (log4j.jar, junit.jar, commons-pool.jar etc), however this is where support and integration with other products fall down.

A Controlled Approach

You need to keep a seperate repository (under source code control of course) of your external libraries, and this becomes the source across all corporate projects. This is to include the following for each library:

The actual deployed jar
The matching source code of the deployed jar
Java Documentation of deployed jar

Versioning

Log4J is an example of an Open Source project that does version their jars, will many other open source projects do not. Why don’t they? Well one reason is to enable people to upgrade easily but simply overriding existing versions, and processes that have specific CLASSPATH’s are not affected. Generally today, implementations of software include all jar’s within a specified directory so I don’t see the problem.
Log4J gives in this example a log4j-1.2.12.jar for deployment purposes. When libraries do not include a version number, the are to be specifically added. This adds another small delemma of standards. The general practice is to use the hyphen ‘-‘. followed by the product version using the full-stop ‘.’, however there are projects that don’t follow this.

Version Recording

So now we have for each external library, an appropiately versioned jar, and matching source and documentation. This is the initial baseline. What’s needed is a simple HTML index that manages this information for use. The Index should include:

Product Name
Product URL
Repository Version
- Version Number
- Version Date
- Download Date
Latest Version
- Version number
- Date
- Comment

You may ask, why do you record the Latest Version, when the practice should be to always get the latest version. JUnit is a good example, the present version 4.x, requires a JDK 1.5.x deployment, and if your application is running only 1.4.x, then you can only use older 3.8.x versions.s mentioned earlier,

Management

Having an index of external libraries is one thing, correct use and management is the most important step. Let’s assume we have taken the time to download and document the required libraries from our example, and everything has been deployed into our first project.

Now, a 2/3 month task of checking for updated versions can be scheduled. Withing this process, newer versions can be downloaded and recorded appropiately. In our example, Log4J now has 1.2.13. Updating the external libraries repository is the simple part, the next step is to notify all coporate projects of the new version, and to encourage uptake. This may not always occur in a timely manner, but with at least this baseline in place and when there are issues, standardisation on the known coporate version is the first step.

Dependencies

Within each project libaries, a readme that details which versions of which external jars are included andwhen they were last updated from the repository should be done. Noting this information with the both the external libraries repository and the project repository provides a paper trail. In addition, should there be any exceptions, this is the place this information can be reported.

External Projects

Canoo WebTest is a good example of an external project that also includes other external libraries such as Log4J, JUnit, HtmlUnit, NekoHtml.
Problems arise when these products may use and implement older or unknown versions of libraries.

Internal Projects

Having internal projects that are dependent on other internal projects is nothing new in a large corporate enterprise. The problem arises when a spaghetti of undocumented dependancies causes a management nightmare. Let’s take this real life example.

Product A has included jars of Product B, Product C, Product D and Product E. The Product C actually has a different version of Product D embedded within in. Product D which now is included twice also includes Product E, so there are now three copies of this, all are different in size, and all a binary only with no version numbers, and no corresponding matching source code. Does this sound bad? Well it is. How it ever worked is still amazing.

This mess could have been managed first with Version Control (a basic 101 in software development), and an appropiate management of external libraries, and a similar approach to internal libraries.

An Example

This is a great example to highlight the cost in lack of appropiate management. I was supporting an existing large scale project (1000+ users) (let’s call this Product A), and the integration of a new project (let’s call this Product B) had been passed from the development team for implementation, testing and release. A threaded process, it would simply just hang after some initialisation, no notice, no errors, just nothing. Not withstanding that something should have been better reported for the errors. Due to 7 possible log files between the software application and the application server, nothing was reported, but that’s another topic.

The final result was Product B had introduced the use of org.apache.log4j.Logger.trace(), a new more granular logging then debug(). The appropiate Log4J jar had been included in the product, and this was Version 1.2.12. However, Product A, which was using Product B, was bundled with an earlier version of Log4J, Version 1.2.8, and this version didn’t support this new method.

While it took a few hours of debugging to find this problem, it was made easier because at least these jar’s were version, of the 20-30 jars across products only 3-4 were versioned. Similar problems with QName and XMLBeans unamed jars prior to this took days to resolve (indeed one had to be worked around as it couldn’t be resolved).

A further complication in this process was when Product B was introduced. This was developed and built under Linux, while Product A was still being maintained under Windows. From the experience of Integration is was found that the order of loading within the classloader of a commerical application server differed between operating systems.

What constitutes a good error message to the user?

June 19, 2006 by ronald

Today, will go down in my professional history as quite possibly the lowest I would ever think of a software developer. I’ve carefully avoided the term “fellow coder”, speaking of a IT industry sticking by fellow IT people, but not today.

I presently support an existing production system, some 1000+ users that’s been in place around 3 years in which I’ve had no prior involvement until recently. Integration with other external parties within the system have provided an esclation in errors being reported in this external communication, and lack of adequate feedback to the user is another topic. Email is the means of reporting administratively of user errors, again another topic of issue. Within these emails, which are almost impossible to manage due to the limited internal GUI only toolset and lack of access to actual email account files to automate scripting (yet another topic? Do you see a trend here), is some relevent information regarding the transaction, and then most importantly the error message. The thing I need to investigate the problem.

The line reads (removing some stuff):

Error Code: 234567892

Ok, well a little cryptic, but surely you can work out from the external system what this means. Investigation of some more errors, in the mail GUI product, yet another series of open windows (you can’t view messages like a regular mail client with a summary list and a detail panel), provides for a trend in the error messages:

Error Code: 1235492567
Error Code: -434783134
Error Code: 34345199

The trend being there is none. Of course today by mid morning the email error count is into the hundreds, and I’m none the wiser. Well time to closely investigate the code management (as I’ve already contacted the external party, and asked if I can provide some error codes to receive greater information).

The following constitutes the two lines of code taken in the determination of the error messages I’ve shown so far. Note, this code takes the external system response, and then “attempts to determine usefull error content for presentation back to the user”.

errorNo = new Random().nextInt();
error = “Error Code: ” + errorNo;

Now while everybody laughed out loud, including fellow developers, DBA, IT manager, the Business Owners and Users (which can’t read code but could understand the first of these two lines I highlighted), and yes it really was funny in context with a bigger picture, but really it wasn’t funny at all. Some things make my blood boil, and this was one of them. With all the time lost between multitudes of people, users, call centre etc, I’d never felt a stronger conviction to hunt down the developer that wrote this.

The end of the story is after even trolling old CVS repository entries I was unable to piece sufficient information to determine the author. Most likely done pre version control, and then that trail leads to only a few names I’ve heard mentioned before.

I’d like to see somebody top that one!

The definition of a Unit Test

May 7, 2006 by ronald

A Test is not a Unit Test if:

It talks to the database
It communicates across a network
It touches the filesystem
It can’t run the same time as any of your other unit tests
You have to do special things to your environment to run it (e.g. editing config files)

Could not have said Agile better myself

April 25, 2006 by ronald

I’ve just attended Scott Ambler’s presentation on Agile Database Techniques: Data Doesn’t Have to be a Four-Letter Word Anymore at the MySQL Users Conference.

There is so much content on the topic, it’s impossible to present so much information in a short 45 minute session. I can speak with authority in regards to the same problem of condensing so much content given this issue with my own presentation MySQL for Oracle Developers.

I ask this question. Why is common sense considered such a radical approach? I state this because Agile Methodology approaches in so many ways are common sense, but “traditionalists” (and I use this term for several groups of existing IT dinosours), see change and continual improvement approaches as potentially evil, while they are constantly just trying to stay afloat daily with bloated, inefficient and overly complex legacy systems (I had to throw in several daggers at the same time, couldn’t resist).

This presentation echoed both a lot of my experiences and part of my writings and current projects. I have been a database modeller for over 16 years, and I have worked with Extreme Programming Agile Methodology for now over 6 years. Here are some abstract and unstructured bullet points from the presentation and some of my own comments intermingled. (Unfortunately due to other discussions I missed the initial 15 mins, however given the content and my own professional standings I can only fully understand what the initial content was).

Agile Modeling Driven Development AMDD
Agile Data Modelling
Iteration o (zero) should consist of an Initial Domain Model to provide scope and visibility of the bigger picture, but only a higher level view. Any extended time spent is wasted time.
Why use this approach to data modelling? To handle change efficiently.
There is a clear lack of tools and techniques in Automated Testing and Code Coverage concepts specifically for Data. See my thoughts on this at the links at the end of this entry.
Incremental software development enables a production rollout after every iteration
There is movement to a Data Modelling Standard based on the UML Notation
Transitioning approach to schema changes. Interesting concept, not necessary in a new project with Test Code Coverage, but essential for Legacy systems for a low risk approach using small steps.
Stop talking about data quality and actually doing something about it. Bad data, or more specifically must have migrated data, that breaks all the integrity in a new system has been the bain of my experiences in large data migration projects.
Generalising Specialists

Scott threw in a lot of Agile terms, and for most people in the audience I observed that quite a few of these terms were indeed foreign. I could easily see the potential for a Talk on Introduction to Agile – Applying to MySQL Projects.

One term I didn’t hear was YAGNI – You Ain’t Gonna Need It. This is an essential XP principle which in some ways sums up the common sense approach to software development. Don’t even thing about it until it’s made it’s way to the top of the customer requirements for the current interation.

I’ll also be getting the book “Refactoring Databases – Evolutionary Database Design” that was mentioned, I’m keen to read in depth more of the principles I so much promote myself. Sometimes I feel quite isolated in this area of Data Modelling colliding with Agile Methods.

I’ve written previously content that both re-inforces a number of points of this presentation and also complements Scott’s presentation in a number of ways. These include Unit Testing A Database and Database Modelling within an XP Methodology.

I recall this quote from a tee-shirt once owned, and I think is valid for certain IT professionals that continue to cling to traditional approaches in data modelling. Perhaps I should put it on an Agile slogan shirt for myself. Evolve or Die

In closing, I’ve been wanting to write a paper for quite some time titled “Better Productivity and Quality. An Agile Approach” to share my experiences. I’ve haven’t been able to put my thoughts down, having two other major writings in progress at present, but this presentation has only renewed my vigor.

Why IT professionals get a bad name

April 5, 2006 by ronald

Sometimes you just can’t find words to describe bad code, and if you are forced into maintenance it can be a mindfield. I’m presently supporting an existing deployed Web Based Java application, which I’ve had no involvement with previously, and for lack of any complements it’s absolutely terrible.

How this code was ever written, and of course never reviewed (that’s another talk on software development) one can only shutter. If it was an isolated case, then perhaps, but the code is bloated, and riddled with unnecessary complexity. Basic 101 practices such as Standard CSS, well formed html pages (ie, not with double <html> and <body> tags), and Javascript functions that are not duplicated up the warzoo are things you can discover spending 2 mins looking at just the first screen before you even look at the Java Code.

Here is an example.

tempKeyStr = tempKeyStr.substring(0,2).concat(String.valueOf(Integer.parseInt(tempKeyStr.substring(2).trim())));

Ok, I guess I should give you a hint of the type of values in the tempKeyStr prior to this wonderful transformation. This string contains a 2 alphanumerical character prefix, then optionally a space, then a numeric number (usually 2, 3 or 4 digits). An example may be ‘XX 999′ or ‘AA99′.

This will help you to identify what this lines now does?

Ok, well to save the suspense, the purpose of the above line of code is to trim and squeeze any whitespace from the string. So something like the following would surfice.

tempKeyStr = tempKeyStr.replaceAll(" ","");

Of course refactoring would totally eliminate this line, placing the replaceAll syntax in a previous declaration.
Those observant Java guru’s would also state, but what if this was written 5 years ago, replaceAll is only available since 1.4. Yes, correct, but Pattern has always existed, replaceAll is just a convience wrapper for this, and this type conversion from String to Integer to int to String is unnecessary complexity.

On that note of refactoring, I was sent the following link recently Refactoring to the Max. As the title suggests, sometimes people go to far.

Data Modelling

March 26, 2006 by ronald

I’m a data modeller. I specialise in this, and for a number of years on large projects I’ve been able to focus on this single task within the System Development Life Cycle of software development for several months at a time. Unfortunately what depresses me the most, is I can’t get a full time position in what I’m an expert in. It’s not a specialised skill that an organisation can use on a full-time basis, unless it’s a large organisation, and quite frankly, Brisbane isn’t a market that can support the diversity of large organisations. (caveat, large organisations that are proactive in software development, not just large organisations that have significant IT requirements, but do not work proactively). This is why I can also do Software Development, Database Administration, and even System Administration. Again, I’m not good enough to fill one of these positions in a larger organisation as an expert, but I can generally hold my own, usually even with surpising results. (Side note, even this week, I was providing a possible solution and tool for system adminstration across a large organisation, and it was 5 mins work. Something the paid full-time system administrators were not providing????)

I only started looking at Domas Mituzas wordpress: friendly look at query log. I didn’t have to read far to see where it was going, and well I quite quickly turned off, sorry Domas, I’m sure your concluding points were valid. This is my point, and it has been echoed in our local MySQL users group as well, the lack of appropiate database design in open source projects. There are several contributors, but one I put down to the “Hobbist and the Professional Syndrome”. A topic for further discussion, but in summary here are the bullet points from a slide in a presentation I prepared.

Hobbist

Downloadable software and examples
Online tutorials
Books like Learn in 24 hours/For Dummies

Professional

Formal Qualifications
Grounding in sound programming practices
Understanding of SDLC principles
Worked in team environment

Middle Ground Developer

Time to skill verses output productivity
Depends on environment and requirements

And on a final note, I guess why doing some raving. I find it criminal that organisations encourage at times a level of incompetence by promoting people that develop bad code into positions where they can continue to ensure that bad code stays, and further business decisions only engrain an organisation down the continued wrong path. There is already enough poor software developers out there that give the industry a bad name, but the good ones are few and far between.

What do we do, how can we solve this problem? I don’t think it can be solved now in the Open Source community. Adopting an Agile Development methodology such as Extreme Programming (XP) for example, it a very good start in organisations, something I’ve been working with now, or working with the principles for 6 years.

PS: Modelling is actually spelt both Modeling and Modelling (2 l’s) across the various English derivitives. Just incase somebody wanted to make comment.

Waterfall verses Agile in Software Development

February 6, 2006 by ronald

There appears to be a resurgence of Waterfall Approaches. This upcoming conference, you can find details at www.waterfall2006.com raises some good points that are worthy of review based on the comment that ‘agile is so last decade’.

Well worth a quick look at the agenda and tutorials being proposed.

XP January Meeting

January 25, 2006 by ronald

The Brisbane XP Group met yesterday for a presentation by Dr Paul King of Asert on the book Sustainable Software Development : An Agile Perspective.

I found it a good time to get a collective opinion and review of the techniques and methods we are moving towards in Software Development. Indeed one key point better describing Pair Programming has been added to my upcoming conference presentation Overcoming the Challenges of Establishing Service and Support Channels. I’m hoping Paul makes his notes available as a review of this book, that I will also mention in my presentation.

In Review, this is some of the key points I got from this presentation.

Software gradually degrades over time, and will become a maintenance nightmare
Successful software will be changed again and again
The IT industry has a problem historically with credibility

So the goal is to move towards Sustainability. Some of the points mentioned by Paul were:

Continual refinement
A working product at all times (not just working software)
Value Defect Prevention over Defect Detection
Additional investment and emphasis on design

On point I struggle with is Pair Programming. I don’t struggle with the concept, it’s great and really works. The struggle is selling Pair Programming as a core XP Principle. Some good points of discussion lead to a better angle.

Pair Programming – should be de-emphasised as a key point
By selling Defect Prevention and using Continuous Code Reviews as one method of implementing this
Continuous Code Review are achieved with Pair Programming

Much easier. The key point is management understands the term Code Reviews, and if you can show the effect of Defect Prevention on support costs, using Pair Programming, Refactoring and other techniques, your sales pitch will be easier.

Also for reference, the book Software Craftsmanship: The New Imperative was mentioned as a book with similar ideals. A third recommended reading book that was mentioned at the meeting was The Pragmatic Programmer: From Journeyman to Master.

Book Review – Beyond Java

January 21, 2006 by ronald

Well the title got me when I decided to purchase this book “Beyond Java – A glimpse at the Future of Programming Languages”, however perhaps it should have been titled “Why to move from Java to Ruby” as the book for a good portion is an explanation of how Ruby solves the problems that Java has and the direction Java is moving. While the book did describe where Java was, and the future limits and what to look at Beyond Java, the high use of Ruby to describe these overwhelmed the book. In fact, only the last chapter of 20 pages gave an comparison of “Contenders” as the chapter title described other then scant descriptions

Initially I lost count of the number of times information regarding C++ was repeated in the book, and how Java got it’s great penetration from the C++ community. I almost put the book down after the first few chapters, it was highly repetitive.

However, given my increasing interest in Ruby I was able to work though this. I could see a Java developer that has already discard Ruby as a fad to put this book down. In fact, as a Ruby reference it provided some good tips, again strengthening my comment of including Ruby in the title.

Overall an interesting read, however for a small book it could have offered a lot more.

On the same topic, some interesting points in the article The Problems With Java.

Unit Testing A Database

January 15, 2006 by ronald

In a recent job interview I was asked the question regarding Unit Testing/Automated Testing of a Database? An interesting question and indeed an interesting problem. I thought it was a good topic to describe what I’ve done in the past, and where I would go for a more complete testing environment given the opportunity of a entire XP project.

This is the approach I have implemented successfully in the past. It’s not a complete solution, however at the time with the client it provided appropriate coverage.

I don’t use a framework such as dbUnit to load data via XML, or specifically test data. XML is ugly to store data, and also with maintenance and comparison. I start with a pre-configured database of representative sample data, refer to my notes later on this, and then I use the tests of the application to perform the necessary data manipulation. This ensures that you are as close as possible to testing actual situations, and ensuring that any issues the application does (such as enter bad data, or RI failures) are caught appropriately.

Within this process an automated build test would first reset the database to a known set of data. I’ve also found that this helps as you can also recreate the schema if necessary. As part of Schema Design in an XP Development, I have two ways to create the database schema. You can create it from scratch (so there is always appropriate SQL to create the current version of the Schema, lets say BUILD_102. Alternatively, you can always upgrade between releases, for example between BUILD_101 and BUILD_102 with the appropriate upgrade scripts. Upgrade or patch scripts only move one version to the next version. It’s not possible in a production environment to simply recreate your schema for each release, however for testing and training you can. It’s also pointless after 50 releases to have to perform 50 patch releases from the original source schema for every automated build test.

This does lead to two paths necessary for creating a schema, but this can also be tested adequately in an automated way.

I also split my application tests suites that use the sample data into two buckets. Destructive and Non-Destructive. The reason is the Non-Destructive tests (i.e., non DML statements) can be re-run as many times as necessary. The Destructive tests (i.e., DML statements) can only be run once, before the database must be restored. Of course you can have the approach of setUp() and tearDown() within JUnit however it’s cleaner if you can extract this somewhat to a higher level, making the Unit Tests easier. By also running tests that don’t continually use the some data, or builds the data though test execution, you get a better coverage of different data sets. To give a few examples, You could create a Test that created a row of data, then edited the same row, then deleted the row. These are indeed valid, but if the first test fails, how do you know if the update and delete tests are also broken, they are by dependant by default and will fail, but did they really. If with your sample data, you created a new row, edited a different pre-configured row, and also deleted a different pre-configured row, you could eliminate the need to dependencies.

Now of course, there are situations where data must be specifically checked at the database level, for example it may never be displayed in your application, it may be intermediate information that is then summarised for display, or internal audit information held against data (for example Create User, Last Updated User), or data created by procedures or triggers. There are also situations when even within an application testing that can verify the data in a User Interface, you want to verify this at the source.

To this end I have a custom written JUnit extension that can perform specific SQL statements and comparison. I’ll need to write about this and provide this at a later time. (when I can dig it up)

Sample Data

Sample Data in the database is pre-configured, not in XML files, but so it can be managed by more primitive means, either by a database GUI interface or via SQL flat files or even text files. Why have pre-configured data this way? A few reasons.

It’s not coupled to your tests in any way, so it can be reused, for example as Training Data.
You can use database specific tools more easily say in loading the data in a relational way.
You can use the same database specific tools to export the data easily, if say you use an application to modify certain information.
You could more easily incorporate legacy data that is also being migrated if you use the same database specific tools.

Granted XML is universal in it’s data representation, it’s more self descriptive, but it’s a really pain to edit manually, and it’s very verbose when there are simply more primitive methods of this type of data management.

So we are creating a pre-configured data set, and an extensive one when possible. As I mentioned, the re-use capabilities for training or demonstrations really works.

Training Data

I have successfully with a number of systems, specifically CRM implementations used a Cartoon Environment for the sample data. There are a few reasons for this. First, most people I’ve ever met can related in some way to some set of the data. If they can’t, then there can read info online, or watch a movie etc, and get an appreciation from the representative data set, effectively I’m leveraging of the time and effort of others here, much better then a non-descript set of data.

You have the cartoon characters (e.g. Mickey Mouse, Donald Duck, Daffy Duck, Marvin the Martian, The Simpsons, The Flintstones, The Jetsons), use all the streets and rides as Disneyland as addresses, the animators as the users (e.g. Walt Disney, Chuck Jones, Stan Lee), you can use the different studios (e.g. Warner Bros, Disney, Pixar) for different states or countries, you can use shows or movies (e.g. Toy Story, Shrek) to group characters in other ways.

With this type of data, common attributes such as birth date, family units, nick names, people deceased etc, are all part of the available data. It’s surprising how much information you can find when using The Simpsons for example, of full names, addresses, interests etc.

It’s impressive when the CEO of a company is showing the application to overseas business partners, when his knowledge of the application (from his management perspective) is sufficient, there is no knowledge of the data really necessary to use or explain as it’s commonly used and generally understood.

At this point I would like to ensure that I correctly acknowledge the registered trademarks of Disney, Warner Bros, Hanna Barbera, Pixax, Dreamworks and that I am not using the names for any profit.

Summary

So this is what I’ve used in the past. What would I do in the future if I was charged with bullet proof testing of a database, even independent of an application, effectively 100% test coverage of the data. Well, this is an unproven approach, but I’d relish the opportunity to give it a full blooded test one day.

How to test the database with an automated test approach.

I’d consider the breaking down of testing into 3 areas. These being:

Schema
Data
Business Logic/Referential Integrity

Each of this is effectively built on the the preceding points.

Schema

This would be quite straight forward it’s a flat comparison between schema’s, which could be managed via the appropriate products data dictionary tables using SQL. You could even simply compare 2 schema’s in a few simple SQL statements. You could also use the approach of export the schema definition, and then compare flat files. You will find some downsides to these approaches, ordering is a big thing, columns within a database table, or the order of the tables that are exported may not be guaranteed. However given appropriate standards are defined used of tools this comparison could easily occur.

Being able to verify patches between releases, and full installed schema’s are also possible. The schema is the easy part.

Data

Data could be tested in varying means. Counts, sums and sample comparisons, but it’s also just data, why not md5sum the entire data. Why not even dump the data to flat files, and use basic difference tools for comparison. One simple approach. Especially if you are loading data, using or manipulating it, then you can export and compare at a file level. This would work very well for data considered Read Only for the life of testing.
This format may allow you to compare data between two different database products, e.g. Oracle use for your transactional online processing, and MySQL use for Web Data or Management Summary Reporting application.

In order to test the data you need the schema, but how can you test the data without the business logic and Referential Integrity. Within MySQL you can easily disable foreign key constraints, or easily adjust the table type to a structure to ignore this syntax. This could allow you to run tests with and without Referential Integrity to determine the strength of your application.

Of course this is a static version of the data. Performing separate testing of DML statements directly against the database could prove a waste of time, unless your application was written in such a way, that your application database layer was a complete API. Still you would be simulating what your application is ultimately doing, so it could be overkill. You could apply the techniques of comparison with know results after a successful running of automated build testing.

Business Logic/Referential Integrity

The hard stuff. Well you could be half way there with adequate schema and data testing. At least you are then confident that core integrity exists.

The problem is also to do with application integrity verses database integrity.

Let’s take a percentage, it goes from 0 to 100. Now using MySQL for example, you would define this as TINYINT(3) UNSIGNED, giving you a valid range of 0-255, and by default a display characteristic of 3 characters (the (3) in this example is just beautification).

You application logic restricts the value into this column to 0 to 100. But do you enforce this at the database? Depends on your needs. If the application is the only way to insert and maintain data, then you could get away with it, if data can be managed from other external systems, you may have APIs that also need to manage it. What if you grant SQL access to DBA’s, could they accidentially mess it up. That song “It’s a fine line between pleasure and pain” comes to mind at this moment. I guess what I’m getting at here, in solely database testing you could easily insert 255 into a percent column and pass a number of data specific tests. I’d assume it would fail some as well otherwise your tests aren’t complete, but when using the application you could never test 255, as the client would never allow it.

There are a lot more issues in RI testing, Cascading UPDATE, DELETE rules for etc. And then when you work all that out, you have to start with Triggers and Stored procedures. I’m not going to spend any time here at this time.

Unforeseen Side Effects

Data is a strange beast. It’s the source of information, so I always like to go back to the data for comparison, however the lack of good data (most notably Legacy Systems Migration) can drive you mad. What good is it to have a new system but not be able to enforce an adequate level of Referential Integrity or Business logic due to incomplete historical data. In essence this has proven in the time I’ve also supported large systems, a good portion of the development cost in support, it’s bad data and/or the need for a simple application to have more complex rules to cater for so called incomplete data.

I’ll give you a trivial example. Gender. In the new system an organisation will always ensure they get the gender of a customer (let’s not wonder how they do it, it’s just an example). So the application is designed to support Male/Female, Reference Data may exist to translate M & F to Male and Female respectively for data storage efficiency. Check constraints, enumeration data types (which I don’t like) may exist. Reports may do side by side comparisons.

Now the company buys a competitor, and then gets their database of 500,000 customers, but they don’t record the Gender. Do you then relax all your great integrity? Do you introduce a gender of Unknown? But that’s only for display, the maintenance screens can’t allow you to select it, so you then need a different level of Reference Data management. Do you make an educated guess and correct the data? Does the customer do an expensive mailout campaign and data collection process to correct this information? So what’s the big deal anyway the customer asks? Well if it’s an organisation that sells hygiene products, you don’t want to sending out material on Shaving Cream to Women on your list and Body Wax to Men. However you can’t use that explanation to describe solely a database driven reason to the customer for the cost of introducing this data. How do you show business value to the customer, when they simply what the data available?

I know this is a trivial example, but if I had a dollar for every trivial problem that customers spent months on,verses the really hard problems, I be writing this from a much more comfortable and relaxing resort haven. (A ski resort rather then a beach resort)

Conclusion

So can you Unit Test a Database solely without an application? Yes. Would you want to? Maybe, to a certain degree. Depending on your type of data. If all your information is highly visible in data entry and data retrieval, you should couple testing more closely with your application. If your data is very generated and collated, rarely user entered but bulk loaded, such as Sensis information or GIS information, then dedicated testing all aspects of the database decoupled from the application could indeed make your application test easier, because it’s easier to identify bad data that the application creates.

Database Modelling within an XP Methodology

January 13, 2006 by ronald

In an eXtreme Programming (XP) Agile Methodology approach towards software development the absence of adequate database design, or the scant regard of it, with the assumption that a framework and persistence infrastructure will take care of that can be a disaster in a larger enterprise solution. In essence it’s a scaling effect. The smaller the system, normally the smaller the number of users, amount of functionality and volume of data does not show the inefficiencies in database design as they can be masked by acceptable performance. But scaling up the system, or designing a large enterprise system the effect will become multiplied quickly. Of course using solid XP practices, the ability to make changes and integrate will be easier, but the amount and complexity of changes may be significant.

A more pragmatic approach is necessary in Database Modelling and Design, especially in a larger enterprise solution when using XP or an Agile approach. Assuming that the choice of Relational Database/s has been chosen, greater care is necessary and advanced preparation and planning required. Purists could argue YAGNI, but ultimately the customer will be distraught if the system is perfect in functionality and user interface but can’t handle the performance a production load of users or gradual growth of users when all is well in testing, demonstrations and customer training. The other catch is the need for additional disk space added monthly due to unknown requirements.

Additional considerations such as legacy systems data migration, database sizing, database growth, performance requirements, number of users etc can’t conform to a traditional XP approach. These tasks require varying lead times, for example the purchase and configuration of hardware and software need to be augmented with the XP approach.

XP is not for every project, and in the number of instances I’ve been involved with, certain considerations due to the environment, customer and usually management are necessary to be adjusted or tailored for the specific project. I’m not stating that an XP approach can’t apply to large scale enterprise Database Modelling, more that some adjustments particularly within the Planning Process are needed to integrate a more balanced solution.

The database is the foundation, I draw an analogy when discussing with friends to building a house. If you don’t have the core foundation correct, the slab, the essential primary fittings of plumbing, power etc, and a floor plan of key, important and known things, you will forever when building your house be spending additional time and resources to prop this up, taking away time, money and energy from the significant part of building the house so it can be ultimately used. (Thought: I wonder how you would build a house XP style. What’s the most important part for the customer. Something to ponder one day).

Are foundations perfect? No. Do they change and adapt? Yes, They do. However the cost is significantly higher, and the investment in getting it right the first time is invaluable.

I can’t count the number of times I’ve come onto an existing project, and the Database Designer for lack of a better word is a novice. It makes me shutter. It’s an expertise I’ve specialised in, however I’ve had to broaden my skill set, as this task is not used throughout a traditional SDLC project of 12-24 months, and it really saddens me when simple 101 mistakes are made and the downstream impact is significant, and management don’t know or realise the impact. Even worse is when I tell them what’s needed for a correction, and they look at the impact, and the decision is that this fixed known cost to correct is worse then any projected unknown down stream costs of maintenance and future development.

Ultimately I’ve got my way on projects more easily when I’ve also focused on Application Performance Tuning in projects, where I look solely at the application needs, not just the hard core DBA or System Administration tuning. A DBA doesn’t really care about the application and end user impact, they care about the figures in the database. In this situation, low level structural changes and associated costs are weighted against application performance ultimately necessary for the end user. I remember one project that took the development/user/testing/release team 3 months of work (probably ~ 200 man weeks of work) to implement and deploy a key structural change that I had identified and proposed as part of longer performance analysis period of this system. Of course when you had 6 full-time DBA’s alone, 33 remote distributed systems and 1000’s of users, the resultant impact and the future improvements possible were worth the investment. The need not to upgrade the hardware alone was worth it. This project was also over 10 years ago and a lot of techniques have changed since then. Anyway, back to the point of the discussion.

Let me give you an example situation where you use a traditional XP approach to software development, and how to weave in a more structured database design and modelling approach.

User Stories

As part of the gathering of initial User Stories from the customer, the Database Designer should be reviewing these and beginning to build a high level Logical Data Model. The Database Designer is not needed in any initial interaction with the Customer, however early reviews may clearly indicate gaps in stories due to historical experience that can then be feed back to the customer for considerations. Initially it should just be on paper, or a whiteboard, as initially this should correspond to the high level comparison with the User Stories and also the fluid nature of changing user stories. However it should highlight immediately known key entities, key relationships between entities, key integrations to external systems and areas that involve early input of volume estimates of data, perhaps with comparison to existing legacy systems.

For example, a rates billing system I was involved in had ~400,000 clients billed quarterly (4 times a year), and each bill had on average 10 line items (just looking a small sample, and getting a figure from the legacy system). Now it doesn’t take much to consider the size of tables which would record billing history (400,000x4x10) for each cycle, or GL reconcilation over 5 years (400,000x4x10x2x5 that’s 160 Million). Disk storage alone without any hint of the average row length, or indexes can be guestimated when a number of key entities are identified. Of course it could be out by a factor of 2, 5, or 10 times at this early stage, but not a factor of 100 times.

In a situation like this with any entity, even at the first mud map, I would be recording a number of indicators when possible. These would be:

Initial number of rows
Annual growth in rows
Projected rows after ‘x’ years

In addition I’d also flag each entity roughly for performance considerations by access, and also volume of transactions. This helps in identification of key performance considerations which may impact the Database Design in areas such as indexes (i.e. Disk Space) and schema optimisations after normalisation.

OLTP
Batch

For example, the billing process that creates quarterly bills and inserts millions of rows is both a Batch Process and a frequency of once every three months, as apposed to new accounts which is OLTP, happens at a regular quantity per day every day, and during the day.

Now, in the mud map these figures don’t have to be accurate, in fact in the billing system example, if the table didn’t have 10,000 rows it didn’t rate a mention, and 10,000 to < 400,000 was considered small. The limit here being the volume of the most key entity. Simply because a number of tables will be a factor larger based on this key figure, and this will swamp any insignificant tables. Again, this is week 1 of a potential 52 week project, it can afford to be vague in areas.

Spike

How do you term this in a XP environment? Call it a Spike. The purpose of a Spike is to explore options and evaluate risk. While in general the practice would be to throw away the results, in this case these Spikes should be kept and built on progressively.

Release Planning

As part of the Customer’s User Stories being estimated and prioritised within Release Planning, more refined Logical Modelling and even Physical Modelling is necessary. Of course until the customer refines priorities based on estimates, the actual order of implementation can’t be confirmed.

It is possible however a number of stories within an iteration may not relate together within the Database Model. How do you address this? Regardless of the actual attributes of these logical entities that are not available, a physical model representation can be built on necessary tables and relationships to ensure practical use of functionality.

Should the need for missing relationships within a Database Model impact the priorities of User Stories? No. Estimates should accurately reflect if additional work is needed for any given User Story, hence the Database Designer is necessary in the estimation process.

Iteration Planning

As part of the Iteration Planning it is key that the physical Database Requirements are in place to enable Writing Unit Tests and Writing Code for each Development Story for the Iteration. This should occur at the start of the Iteration, but is not before.

Daily Stand Up Meeting

This is the best opportunity during the Daily Stand Up Meeting to advise of structure impacts, or developers to request more involvement by the Database Designer in their specific task.

No system is perfect, and there will be times that the Database Model does not reflect the requirements by a coder for a task. At this time the developer is responsible to adjust their instance of the Database model to complete the coding task. The issue is; does the coder have the ability to modify the database schema when checking in? No. The responsibility for the Database Model is for the Database Designer, a difference to the normal coding practice where any coder can modify any code. The next section describes this reason in detail. At this point there is an inconsistency within the repository, and any automated testing for a build will fail. And this is what should happen. From the developers perspective their instance of the Code Ownership is correct, but for the entire iteration it is not.

One could argue this is not valid however, the cycle of Test, Code, Refactor, CheckIn while applying to the individual task, should also apply to the Database Design, however it should be at an iteration level not at a task level. To have your continuous builds breaking will however raise a lot of red flags, and I’m sure this approach will ensure that the tests which are broken as a result, will be corrected by the schema definition (the Code), there will be some refactoring if for example the Developer didn’t serve in the best interests of the bigger picture, the schema is checked in, and the continuous build bar is now green. Yipee. The bar is green, the code is clean, we can all go home now.

The Database Designer

The Database Designer should be dedicated resource in the development team for this task. For example in a team to 6-10 developers you would have 1 or 2 people. Even in a team of 2 your would have 1 person responsible. Unlike coding where any of the team pair up and work on tasks within the Iteration, the Database designer team is responsible for Database Design, like the Development Team is responsible for Coding. There are a few reasons for this:

They are ultimately responsible for the structure, the foundation, and this is the bigger picture that includes visibility and scope of further iterations and releases.
The skills are more specific.
They should also be a part/time developer of the team, so as to best understand the dynamics and also be part of the team.

Correspondingly, the end of the iteration should include an addition code review, that is schema related. A difference report of the schema definition at the start and end of the iteration should be compared, to ensure best practices in the larger picture as well as standards, optimisations, future performance considerations (e.g. indexes), future disk requirements (e.g. adding a new index to a 160 million GL table will take 4GB of disk space).

Other Factors

In fact as part of my upcoming conference presentation Overcoming the Challenges of Establishing Service and Support Channels I spend some time discussing Data Quality. Quite often this is the bane of support due to the complexities of software development to cater for data exceptions, or most commonly data anomalies due to historical data not meeting minimum RI and data specifications in your new system.

Book Review (Part 1) – Better, Faster, Lighter Java

January 11, 2006 by ronald

Well if the weight of the book has anything to do with it, it’s the lightest Java book I’ve got. Better, Faster, Lighter Java, which I got from Amazon, has been a quick read. I’ve done a quarter of the book (60 pages) in one bed-time reading. Some good information, I’ll provide a review when I’ve finished reading the book. What’s surprising that of the content that can be confirmed solely programming (i.e. the code), there were a number of errors in the book already. Here’s a summary of comments of what I’ve already sent to the publisher. (just showing the technical stuff)

Example 1-1. Counter example: implementation (pages 3-4)

Point 1:
Book: Mid page (page 3), you have public abstract Long getID();
Comment: ‘Long’ should indeed be ‘long’ with a lowercase ‘l’.

This problem also occurs on the following lines (page 3)
public abstract void setID(Long id);
public Object ejbCreate(Longong id, int count)
public void ejbPostCreate(Longong, int count)

Example 1-2. Local Interface (page 5)
Point 2:
Book: Top of Page (page 5), you have public abstract Longong getId();
Comment: ‘Long’ should indeed be ‘long’ with a lowercase ‘l’.

This problem also occurs on the following line on (page 5)
public abstract void setID(Longong);

Example 1-3. LocalHome interface (page 5)
Point3:
Book: 4th last line (page 5), and 3rd last line.
Comment: Same comment as Points 1 & 2. ‘Long’ should indeed be ‘long’ with a lowercase ‘l’.

This problem also occurs on the following line on (page 5)
public abstract void setID(Longong);

Example 1-4. Transparent counter (pages 13-14)

Point 3:
Book: On the second last line of page 13, you have private string name;
Comment: ‘string’ should indeed be ‘String’ with an uppercase ‘S’.

Point 4:
Book: On the first line of page 14, you have public void setName(long newName) {
Comment: ‘long’ should indeed be ‘String’

Point 5:
Book: On the fourth line of Page 14, you have public string getName() {
Comment: As per point 3. ‘string’ should indeed be ‘String’ with an uppercase ‘S’.

Figure 2-1 (pages 18-19)

Point 6:
Book: You list 7 points that correspond to the numbers in Figure 2-1. Point 7. Easier to Maintain
Comment: You have no point 7 in your figure.

Unreferenced Code (page 25)

Point 7:
Book: Second line of code in section. String prefix “This code is “;
Comment: You are missing a necessary assignment character = (equals) between prefix and “The…”

Unreferenced Code (2nd Example) (page 25)
Point 8:
Book: Lines result = result + “much, “; and result = result + “simpler, and neater.”;
Comment: While this is correct, it is indeed even more simpler if you replaced on both lines of result = result + with result += . It would not have really been work the mention except you are explicitly trying to demonstrate “simpler and neater”.

Unreference XML (page 32)

Point 9:
Book: In the middle of the code you have the line <include name=”**/*Test.class” />
Comment: While this is indeed valid, it would not work with you present example that you are indeed attempting to Automate with Ant. In your example, you define your JUnit Test Case class Name as TestAdder. This statement would not include the tests. It should indeed be **/Test*.class (with the third * ‘asterick) being a suffix to Test, not a prefix.

Unreference Code (page 45)

Point 10:
Book: Second line of coding section: Account valueObject;
Comment: You do indeed not use this variable, while not an error, it is unnecessary. Refer to next point for more information:

Point 11:
Book: Middle of code: account.setAccountNumber(…..), and the following line account.setBalance(….
Comment: You define no Account variable called ‘account’. So you do infact need a line of syntax: Account account; at some point. Even this however is invalid as you have not obtained an Account object, in order to use setter method setAccountNumber() and setBalance(), so you would need to have Account account = new Account(); This however is also invalid as your previous code (Page 43-44) which defines the Account class has no default constructor of no args. It is however I believe valid, as even though you don’t extend Object, you would get a define implicit Object no args constructor. I’d have to check that, but the point remains, it’s not clean sound code in regards to new Account().
And then to complete these 2 lines, setAccountNumber() is not a defined method of Account, indeed your explicitly don’t have it as part of your comments. (Page 43 Remebering your requirements, you want to keep the account number private, so you scope it accordingly, and omit the setter).

All that being said, you could do the following as an alternative to these lines.

Account account = new Account(result.getString(“accountNumber”), (float)results.getDouble(“balance”));
return account;

This could even be simplified further to simply

return new Account(result.getString(“accountNumber”), (float)results.getDouble(“balance”));

Nice and clean.

Point 12:
Comment, while on the subject of this coding example, given changes necessary, I’d make two other comments regarding this code fragement.
Firstly, you close your stmt variable with stmt.close() but you don’t close your ResultSet variable result. Good coding practice would close both, with appropiate error checking as shown with stmt.
Second, while this method is static, I would not choose to use a global Connection variable referenced as conn. This should be passed to this method as an argument.

Of course I’ve only looked at the first 1/4 of the book for some bedtime reading, and I haven’t actually taken this code and passed via a compiler, but I wanted to bring these comments to your attention. I am however enjoying the content so far of this book.

The Java Spring Framework

January 10, 2006 by ronald

I’ve been reading Spring in Action as part of reskilling in Spring Framework and Hibernate. The rationale of this was, I wanted a better testing capacity of my web apps, and after some review of a number of options and input from other colleagues I went down the Spring path.

Now, Spring throws a lot of new terms at you,Aspect Oriented Programming (AOP) and Inversion of Control (IoC), and it takes some time to get into the application. A Hello World example is not a simple thing, with a number of moving parts. Still no pain, no gain. The obvious change in this development path was a significant increase in XML which started to concern me.

After some more reading and examples, I came increasing worried that I’d opened a can of worms. I choose Spring to ensure betting testing capability, but instead at this time, the verdict is out. The reason is XML. There is a lot of it, and now additional testing of this is necessary. It’s necessary to ensure consistency between XML and Classes, and most importantly, from my work to date, there is a certain amount of complexity in the XML/Class coupling. While it’s a loose coupling, the ability to test incorrect or invalid Spring XML, but valid XML beckons.

In some defense, springs provides an abundance of functionality and integrations with Other Web Frameworks such as JSF, Taspetry, WebWork and Structs, ORM Persistence Frameworks such as Hibernate, JDO, iBATIS, OJB, but when it comes down to it, I’m left thinking, it takes a lot more to get started in Spring, to get productive. It all seems anti KISS and anti YAGNI.

Putting aside my initial impression after about 4 weeks, my latest order of books from Amazon arrived, and while waiting for an X-Ray yesterday, I pulled out Beyond Java. It was interesting that some comments where:

The many frameworks designed to simplify the Java development experience do make more experienced Java developers more productive, but make the learning curve too steep for those new to Java.
Tremendous tools like Hivernate and Spring can let you build enterprise-strength applications with much less effort. But it can take a whole year to confidently learn how to wield these tools with skill.
AOP can also help, by letting you write plain old Java objects (POJOs) for your business rules, and isolated services in prepackaged aspects like security and transactions.These abstractions, though, make an ever rising river for a novice to navigate.

This all adds up to one thing, Complexity, when we should be working towards Simplicity. Why is it harder to write code, surely with all these advances it should be easier to write code, in fact why are we still writing code. You have to wonder when the next jump in the technology will occur.

Then again, we are still driving inefficient cars after 100 years when there are better and more efficient alternatives.

XP Group in Brisbane

December 16, 2005 by ronald

Brisbane has another XP Group. Just found out about it. Info can be found at http://groups.google.com/group/Brisbane-XP. I’ve been involved in some part in 2 previous groups in Brisbane.

I’m thinking about some ideas myself, I’ve got all the XP skills, however I’m now skilling up in Spring (a full-stack Java/J2EE application framework) and Hibernate (a powerful, ultra-high performance object/relational persistence and query service for Java). And I’ve got 2 other friends in similar positions.

Wouldn’t it be great if for 6 to 8 weeks, a few hours a week we could work on a project honning traditional XP as well as having some experience people in technologies helping others. Of course in comes back to some giving all to others, but I’m sure it doesn’t have to be that way.

JUnit 4.0 is getting nearer

December 1, 2005 by ronald

For those Agile developers out there, JUnit requires no explaination. I’ve got CVS details and some examples in my JUnit 4.0 Tutorial.