Guidelines for managing embedded external project dependencies

I’ve yet to find any Java project that doesn’t have dependancies on some other Open Source external libraries. I’ve yet to find a Java project that manages these external dependencies appropiately for support and integration at an enterprise level.

As with most projects, understanding an applying sound principles that scale will help you at a later date, and generally the cost of implementation is minimual at the start, but of course becomes more expensive when it’s really needed. The classic case is Version Control. For over 10 years, even on small single developer projects, I’ve used Version Control, it should be taught at university as an introduction to good programming design, it would greatly benefit software development and maintenance.

Back onto the topic of hand. Let’s use a moderate Java Web Based application, and for the purposes of this discussion the following Open Source external libraries are incoporated. Log4J, JUnit, Canoo WebTest, MySQL JDBC, Apache Commons (Collections, DHCP, Pool, HTTPClient, Taglibs Mailer). I could continue, but this will suffice for the demonstration.

It’s very easy for your project to include the appropiate jar’s such as (log4j.jar, junit.jar, commons-pool.jar etc), however this is where support and integration with other products fall down.

A Controlled Approach

You need to keep a seperate repository (under source code control of course) of your external libraries, and this becomes the source across all corporate projects. This is to include the following for each library:

  • The actual deployed jar
  • The matching source code of the deployed jar
  • Java Documentation of deployed jar

Versioning

Log4J is an example of an Open Source project that does version their jars, will many other open source projects do not. Why don’t they? Well one reason is to enable people to upgrade easily but simply overriding existing versions, and processes that have specific CLASSPATH’s are not affected. Generally today, implementations of software include all jar’s within a specified directory so I don’t see the problem.
Log4J gives in this example a log4j-1.2.12.jar for deployment purposes. When libraries do not include a version number, the are to be specifically added. This adds another small delemma of standards. The general practice is to use the hyphen ‘-‘. followed by the product version using the full-stop ‘.’, however there are projects that don’t follow this.

Version Recording

So now we have for each external library, an appropiately versioned jar, and matching source and documentation. This is the initial baseline. What’s needed is a simple HTML index that manages this information for use. The Index should include:

  • Product Name
  • Product URL
  • Repository Version
    • Version Number
    • Version Date
    • Download Date
  • Latest Version
    • Version number
    • Date
    • Comment

You may ask, why do you record the Latest Version, when the practice should be to always get the latest version. JUnit is a good example, the present version 4.x, requires a JDK 1.5.x deployment, and if your application is running only 1.4.x, then you can only use older 3.8.x versions.s mentioned earlier,

Management

Having an index of external libraries is one thing, correct use and management is the most important step. Let’s assume we have taken the time to download and document the required libraries from our example, and everything has been deployed into our first project.

Now, a 2/3 month task of checking for updated versions can be scheduled. Withing this process, newer versions can be downloaded and recorded appropiately. In our example, Log4J now has 1.2.13. Updating the external libraries repository is the simple part, the next step is to notify all coporate projects of the new version, and to encourage uptake. This may not always occur in a timely manner, but with at least this baseline in place and when there are issues, standardisation on the known coporate version is the first step.

Dependencies

Within each project libaries, a readme that details which versions of which external jars are included andwhen they were last updated from the repository should be done. Noting this information with the both the external libraries repository and the project repository provides a paper trail. In addition, should there be any exceptions, this is the place this information can be reported.

External Projects

Canoo WebTest is a good example of an external project that also includes other external libraries such as Log4J, JUnit, HtmlUnit, NekoHtml.
Problems arise when these products may use and implement older or unknown versions of libraries.

Internal Projects

Having internal projects that are dependent on other internal projects is nothing new in a large corporate enterprise. The problem arises when a spaghetti of undocumented dependancies causes a management nightmare. Let’s take this real life example.

Product A has included jars of Product B, Product C, Product D and Product E. The Product C actually has a different version of Product D embedded within in. Product D which now is included twice also includes Product E, so there are now three copies of this, all are different in size, and all a binary only with no version numbers, and no corresponding matching source code. Does this sound bad? Well it is. How it ever worked is still amazing.

This mess could have been managed first with Version Control (a basic 101 in software development), and an appropiate management of external libraries, and a similar approach to internal libraries.

An Example

This is a great example to highlight the cost in lack of appropiate management. I was supporting an existing large scale project (1000+ users) (let’s call this Product A), and the integration of a new project (let’s call this Product B) had been passed from the development team for implementation, testing and release. A threaded process, it would simply just hang after some initialisation, no notice, no errors, just nothing. Not withstanding that something should have been better reported for the errors. Due to 7 possible log files between the software application and the application server, nothing was reported, but that’s another topic.

The final result was Product B had introduced the use of org.apache.log4j.Logger.trace(), a new more granular logging then debug(). The appropiate Log4J jar had been included in the product, and this was Version 1.2.12. However, Product A, which was using Product B, was bundled with an earlier version of Log4J, Version 1.2.8, and this version didn’t support this new method.

While it took a few hours of debugging to find this problem, it was made easier because at least these jar’s were version, of the 20-30 jars across products only 3-4 were versioned. Similar problems with QName and XMLBeans unamed jars prior to this took days to resolve (indeed one had to be worked around as it couldn’t be resolved).

A further complication in this process was when Product B was introduced. This was developed and built under Linux, while Product A was still being maintained under Windows. From the experience of Integration is was found that the order of loading within the classloader of a commerical application server differed between operating systems.