Database Modelling within an XP Methodology

In an eXtreme Programming (XP) Agile Methodology approach towards software development the absence of adequate database design, or the scant regard of it, with the assumption that a framework and persistence infrastructure will take care of that can be a disaster in a larger enterprise solution. In essence it’s a scaling effect. The smaller the system, normally the smaller the number of users, amount of functionality and volume of data does not show the inefficiencies in database design as they can be masked by acceptable performance. But scaling up the system, or designing a large enterprise system the effect will become multiplied quickly. Of course using solid XP practices, the ability to make changes and integrate will be easier, but the amount and complexity of changes may be significant.

A more pragmatic approach is necessary in Database Modelling and Design, especially in a larger enterprise solution when using XP or an Agile approach. Assuming that the choice of Relational Database/s has been chosen, greater care is necessary and advanced preparation and planning required. Purists could argue YAGNI, but ultimately the customer will be distraught if the system is perfect in functionality and user interface but can’t handle the performance a production load of users or gradual growth of users when all is well in testing, demonstrations and customer training. The other catch is the need for additional disk space added monthly due to unknown requirements.

Additional considerations such as legacy systems data migration, database sizing, database growth, performance requirements, number of users etc can’t conform to a traditional XP approach. These tasks require varying lead times, for example the purchase and configuration of hardware and software need to be augmented with the XP approach.

XP is not for every project, and in the number of instances I’ve been involved with, certain considerations due to the environment, customer and usually management are necessary to be adjusted or tailored for the specific project. I’m not stating that an XP approach can’t apply to large scale enterprise Database Modelling, more that some adjustments particularly within the Planning Process are needed to integrate a more balanced solution.

The database is the foundation, I draw an analogy when discussing with friends to building a house. If you don’t have the core foundation correct, the slab, the essential primary fittings of plumbing, power etc, and a floor plan of key, important and known things, you will forever when building your house be spending additional time and resources to prop this up, taking away time, money and energy from the significant part of building the house so it can be ultimately used. (Thought: I wonder how you would build a house XP style. What’s the most important part for the customer. Something to ponder one day).

Are foundations perfect? No. Do they change and adapt? Yes, They do. However the cost is significantly higher, and the investment in getting it right the first time is invaluable.

I can’t count the number of times I’ve come onto an existing project, and the Database Designer for lack of a better word is a novice. It makes me shutter. It’s an expertise I’ve specialised in, however I’ve had to broaden my skill set, as this task is not used throughout a traditional SDLC project of 12-24 months, and it really saddens me when simple 101 mistakes are made and the downstream impact is significant, and management don’t know or realise the impact. Even worse is when I tell them what’s needed for a correction, and they look at the impact, and the decision is that this fixed known cost to correct is worse then any projected unknown down stream costs of maintenance and future development.

Ultimately I’ve got my way on projects more easily when I’ve also focused on Application Performance Tuning in projects, where I look solely at the application needs, not just the hard core DBA or System Administration tuning. A DBA doesn’t really care about the application and end user impact, they care about the figures in the database. In this situation, low level structural changes and associated costs are weighted against application performance ultimately necessary for the end user. I remember one project that took the development/user/testing/release team 3 months of work (probably ~ 200 man weeks of work) to implement and deploy a key structural change that I had identified and proposed as part of longer performance analysis period of this system. Of course when you had 6 full-time DBA’s alone, 33 remote distributed systems and 1000’s of users, the resultant impact and the future improvements possible were worth the investment. The need not to upgrade the hardware alone was worth it. This project was also over 10 years ago and a lot of techniques have changed since then. Anyway, back to the point of the discussion.

Let me give you an example situation where you use a traditional XP approach to software development, and how to weave in a more structured database design and modelling approach.

User Stories

As part of the gathering of initial User Stories from the customer, the Database Designer should be reviewing these and beginning to build a high level Logical Data Model. The Database Designer is not needed in any initial interaction with the Customer, however early reviews may clearly indicate gaps in stories due to historical experience that can then be feed back to the customer for considerations. Initially it should just be on paper, or a whiteboard, as initially this should correspond to the high level comparison with the User Stories and also the fluid nature of changing user stories. However it should highlight immediately known key entities, key relationships between entities, key integrations to external systems and areas that involve early input of volume estimates of data, perhaps with comparison to existing legacy systems.

For example, a rates billing system I was involved in had ~400,000 clients billed quarterly (4 times a year), and each bill had on average 10 line items (just looking a small sample, and getting a figure from the legacy system). Now it doesn’t take much to consider the size of tables which would record billing history (400,000x4x10) for each cycle, or GL reconcilation over 5 years (400,000x4x10x2x5 that’s 160 Million). Disk storage alone without any hint of the average row length, or indexes can be guestimated when a number of key entities are identified. Of course it could be out by a factor of 2, 5, or 10 times at this early stage, but not a factor of 100 times.

In a situation like this with any entity, even at the first mud map, I would be recording a number of indicators when possible. These would be:

  • Initial number of rows
  • Annual growth in rows
  • Projected rows after ‘x’ years

In addition I’d also flag each entity roughly for performance considerations by access, and also volume of transactions. This helps in identification of key performance considerations which may impact the Database Design in areas such as indexes (i.e. Disk Space) and schema optimisations after normalisation.

  • OLTP
  • Batch

For example, the billing process that creates quarterly bills and inserts millions of rows is both a Batch Process and a frequency of once every three months, as apposed to new accounts which is OLTP, happens at a regular quantity per day every day, and during the day.

Now, in the mud map these figures don’t have to be accurate, in fact in the billing system example, if the table didn’t have 10,000 rows it didn’t rate a mention, and 10,000 to < 400,000 was considered small. The limit here being the volume of the most key entity. Simply because a number of tables will be a factor larger based on this key figure, and this will swamp any insignificant tables. Again, this is week 1 of a potential 52 week project, it can afford to be vague in areas.

Spike

How do you term this in a XP environment? Call it a Spike. The purpose of a Spike is to explore options and evaluate risk. While in general the practice would be to throw away the results, in this case these Spikes should be kept and built on progressively.

Release Planning

As part of the Customer’s User Stories being estimated and prioritised within Release Planning, more refined Logical Modelling and even Physical Modelling is necessary. Of course until the customer refines priorities based on estimates, the actual order of implementation can’t be confirmed.

It is possible however a number of stories within an iteration may not relate together within the Database Model. How do you address this? Regardless of the actual attributes of these logical entities that are not available, a physical model representation can be built on necessary tables and relationships to ensure practical use of functionality.

Should the need for missing relationships within a Database Model impact the priorities of User Stories? No. Estimates should accurately reflect if additional work is needed for any given User Story, hence the Database Designer is necessary in the estimation process.

Iteration Planning

As part of the Iteration Planning it is key that the physical Database Requirements are in place to enable Writing Unit Tests and Writing Code for each Development Story for the Iteration. This should occur at the start of the Iteration, but is not before.

Daily Stand Up Meeting

This is the best opportunity during the Daily Stand Up Meeting to advise of structure impacts, or developers to request more involvement by the Database Designer in their specific task.

No system is perfect, and there will be times that the Database Model does not reflect the requirements by a coder for a task. At this time the developer is responsible to adjust their instance of the Database model to complete the coding task. The issue is; does the coder have the ability to modify the database schema when checking in? No. The responsibility for the Database Model is for the Database Designer, a difference to the normal coding practice where any coder can modify any code. The next section describes this reason in detail. At this point there is an inconsistency within the repository, and any automated testing for a build will fail. And this is what should happen. From the developers perspective their instance of the Code Ownership is correct, but for the entire iteration it is not.

One could argue this is not valid however, the cycle of Test, Code, Refactor, CheckIn while applying to the individual task, should also apply to the Database Design, however it should be at an iteration level not at a task level. To have your continuous builds breaking will however raise a lot of red flags, and I’m sure this approach will ensure that the tests which are broken as a result, will be corrected by the schema definition (the Code), there will be some refactoring if for example the Developer didn’t serve in the best interests of the bigger picture, the schema is checked in, and the continuous build bar is now green. Yipee. The bar is green, the code is clean, we can all go home now.

The Database Designer

The Database Designer should be dedicated resource in the development team for this task. For example in a team to 6-10 developers you would have 1 or 2 people. Even in a team of 2 your would have 1 person responsible. Unlike coding where any of the team pair up and work on tasks within the Iteration, the Database designer team is responsible for Database Design, like the Development Team is responsible for Coding. There are a few reasons for this:

  • They are ultimately responsible for the structure, the foundation, and this is the bigger picture that includes visibility and scope of further iterations and releases.
  • The skills are more specific.
  • They should also be a part/time developer of the team, so as to best understand the dynamics and also be part of the team.

Correspondingly, the end of the iteration should include an addition code review, that is schema related. A difference report of the schema definition at the start and end of the iteration should be compared, to ensure best practices in the larger picture as well as standards, optimisations, future performance considerations (e.g. indexes), future disk requirements (e.g. adding a new index to a 160 million GL table will take 4GB of disk space).

Other Factors

In fact as part of my upcoming conference presentation Overcoming the Challenges of Establishing Service and Support Channels I spend some time discussing Data Quality. Quite often this is the bane of support due to the complexities of software development to cater for data exceptions, or most commonly data anomalies due to historical data not meeting minimum RI and data specifications in your new system.

Ruby

Being a little despondent regarding Spring, a framework I’ve chosen to skill up in Read More, I’ve changed tack to investigate further Ruby. I was in a training demonstration of Ruby late last year, I’ve had other colleagues talk about it, and in a number of readings of late, Ruby has been making an impact, so time to delve in. I’ve got my working notes on my Wiki, and it took all of a few minutes to be operational. There appears a good wealth of reference material available including at least 2 online books. You can check out these in my Ruby References section.

Here is a comment from one of the current books I’m reading now. Beyond Java.

My partner and I decided to implement a small part of the application in Ruby On Rails, a highly productive web-based programming framework. We did this not to satisfy our customer, but to satisfy a little intellectual curiosity. The results astounded us:

  • For the rewrite, we programmed faster. Mush faster. It took Justin, my lead programmer, four nights to build what it had taken four months to build in Java (RB: They were using Hibernate,Spring and Web Work). We estimated that we were between 5 and 10 times more productive.
  • We generated one-fourth the lines of code; one-fifth if you consider configuration files
  • The productivity gains held up after we moved beyond the rewrite.
  • The Ruby on Rails version of the application performed faster. This is probably not true of all possible use cases, but for our application, the ROR active record persistence strategy trumped Hibernates’s Object Relational Mapping (ORM), at least with minimal tuning.
  • The customer cared much more about productivity then being on a safe Java Foundation. (RB: Highlighted)

As you can well imagine, this shook my world view down to the foundation.”

From Beyond Java. by Bruce. A. Tate pg 3-4.

Book Review (Part 1) – Better, Faster, Lighter Java

Well if the weight of the book has anything to do with it, it’s the lightest Java book I’ve got. Better, Faster, Lighter Java, which I got from Amazon, has been a quick read. I’ve done a quarter of the book (60 pages) in one bed-time reading. Some good information, I’ll provide a review when I’ve finished reading the book. What’s surprising that of the content that can be confirmed solely programming (i.e. the code), there were a number of errors in the book already. Here’s a summary of comments of what I’ve already sent to the publisher. (just showing the technical stuff)

Example 1-1. Counter example: implementation (pages 3-4)

Point 1:
Book: Mid page (page 3), you have public abstract Long getID();
Comment: ‘Long’ should indeed be ‘long’ with a lowercase ‘l’.

This problem also occurs on the following lines (page 3)
public abstract void setID(Long id);
public Object ejbCreate(Longong id, int count)
public void ejbPostCreate(Longong, int count)

Example 1-2. Local Interface (page 5)
Point 2:
Book: Top of Page (page 5), you have public abstract Longong getId();
Comment: ‘Long’ should indeed be ‘long’ with a lowercase ‘l’.

This problem also occurs on the following line on (page 5)
public abstract void setID(Longong);

Example 1-3. LocalHome interface (page 5)
Point3:
Book: 4th last line (page 5), and 3rd last line.
Comment: Same comment as Points 1 & 2. ‘Long’ should indeed be ‘long’ with a lowercase ‘l’.

This problem also occurs on the following line on (page 5)
public abstract void setID(Longong);

Example 1-4. Transparent counter (pages 13-14)

Point 3:
Book: On the second last line of page 13, you have private string name;
Comment: ‘string’ should indeed be ‘String’ with an uppercase ‘S’.

Point 4:
Book: On the first line of page 14, you have public void setName(long newName) {
Comment: ‘long’ should indeed be ‘String’

Point 5:
Book: On the fourth line of Page 14, you have public string getName() {
Comment: As per point 3. ‘string’ should indeed be ‘String’ with an uppercase ‘S’.

Figure 2-1 (pages 18-19)

Point 6:
Book: You list 7 points that correspond to the numbers in Figure 2-1. Point 7. Easier to Maintain
Comment: You have no point 7 in your figure.

Unreferenced Code (page 25)

Point 7:
Book: Second line of code in section. String prefix “This code is “;
Comment: You are missing a necessary assignment character = (equals) between prefix and “The…”

Unreferenced Code (2nd Example) (page 25)
Point 8:
Book: Lines result = result + “much, “; and result = result + “simpler, and neater.”;
Comment: While this is correct, it is indeed even more simpler if you replaced on both lines of result = result + with result += . It would not have really been work the mention except you are explicitly trying to demonstrate “simpler and neater”.

Unreference XML (page 32)

Point 9:
Book: In the middle of the code you have the line <include name=”**/*Test.class” />
Comment: While this is indeed valid, it would not work with you present example that you are indeed attempting to Automate with Ant. In your example, you define your JUnit Test Case class Name as TestAdder. This statement would not include the tests. It should indeed be **/Test*.class (with the third * ‘asterick) being a suffix to Test, not a prefix.

Unreference Code (page 45)

Point 10:
Book: Second line of coding section: Account valueObject;
Comment: You do indeed not use this variable, while not an error, it is unnecessary. Refer to next point for more information:

Point 11:
Book: Middle of code: account.setAccountNumber(…..), and the following line account.setBalance(….
Comment: You define no Account variable called ‘account’. So you do infact need a line of syntax: Account account; at some point. Even this however is invalid as you have not obtained an Account object, in order to use setter method setAccountNumber() and setBalance(), so you would need to have Account account = new Account(); This however is also invalid as your previous code (Page 43-44) which defines the Account class has no default constructor of no args. It is however I believe valid, as even though you don’t extend Object, you would get a define implicit Object no args constructor. I’d have to check that, but the point remains, it’s not clean sound code in regards to new Account().
And then to complete these 2 lines, setAccountNumber() is not a defined method of Account, indeed your explicitly don’t have it as part of your comments. (Page 43 Remebering your requirements, you want to keep the account number private, so you scope it accordingly, and omit the setter).

All that being said, you could do the following as an alternative to these lines.

Account account = new Account(result.getString(“accountNumber”), (float)results.getDouble(“balance”));
return account;

This could even be simplified further to simply

return new Account(result.getString(“accountNumber”), (float)results.getDouble(“balance”));

Nice and clean.

Point 12:
Comment, while on the subject of this coding example, given changes necessary, I’d make two other comments regarding this code fragement.
Firstly, you close your stmt variable with stmt.close() but you don’t close your ResultSet variable result. Good coding practice would close both, with appropiate error checking as shown with stmt.
Second, while this method is static, I would not choose to use a global Connection variable referenced as conn. This should be passed to this method as an argument.

Of course I’ve only looked at the first 1/4 of the book for some bedtime reading, and I haven’t actually taken this code and passed via a compiler, but I wanted to bring these comments to your attention. I am however enjoying the content so far of this book.

The Java Spring Framework


I’ve been reading Spring in Action as part of reskilling in Spring Framework and Hibernate. The rationale of this was, I wanted a better testing capacity of my web apps, and after some review of a number of options and input from other colleagues I went down the Spring path.

Now, Spring throws a lot of new terms at you,Aspect Oriented Programming (AOP) and Inversion of Control (IoC), and it takes some time to get into the application. A Hello World example is not a simple thing, with a number of moving parts. Still no pain, no gain. The obvious change in this development path was a significant increase in XML which started to concern me.

After some more reading and examples, I came increasing worried that I’d opened a can of worms. I choose Spring to ensure betting testing capability, but instead at this time, the verdict is out. The reason is XML. There is a lot of it, and now additional testing of this is necessary. It’s necessary to ensure consistency between XML and Classes, and most importantly, from my work to date, there is a certain amount of complexity in the XML/Class coupling. While it’s a loose coupling, the ability to test incorrect or invalid Spring XML, but valid XML beckons.

Spring Framework In some defense, springs provides an abundance of functionality and integrations with Other Web Frameworks such as JSF, Taspetry, WebWork and Structs, ORM Persistence Frameworks such as Hibernate, JDO, iBATIS, OJB, but when it comes down to it, I’m left thinking, it takes a lot more to get started in Spring, to get productive. It all seems anti KISS and anti YAGNI.

Putting aside my initial impression after about 4 weeks, my latest order of books from Amazon arrived, and while waiting for an X-Ray yesterday, I pulled out Beyond Java. It was interesting that some comments where:

  • The many frameworks designed to simplify the Java development experience do make more experienced Java developers more productive, but make the learning curve too steep for those new to Java.
  • Tremendous tools like Hivernate and Spring can let you build enterprise-strength applications with much less effort. But it can take a whole year to confidently learn how to wield these tools with skill.
  • AOP can also help, by letting you write plain old Java objects (POJOs) for your business rules, and isolated services in prepackaged aspects like security and transactions.These abstractions, though, make an ever rising river for a novice to navigate.

This all adds up to one thing, Complexity, when we should be working towards Simplicity. Why is it harder to write code, surely with all these advances it should be easier to write code, in fact why are we still writing code. You have to wonder when the next jump in the technology will occur.

Then again, we are still driving inefficient cars after 100 years when there are better and more efficient alternatives.

Handling User Requests of 'I got an error!'

Don’t you just hate that, a user at the end of the day, week, or only when you ask them say “I got an error.” Ok, well can you give me some more information. “No” is the normal answer.

I presently manage this in my applications with the following 2 references.

  • Any 500 error pages are automatically emailed, so I can see frequency, and hopefully via a stackTrace, a good indication of location.
  • For every requested URL, I log the details of date/time, user, sessionId, URI and all Parameters (GET&POST)

Given this, when an error occurs I can usually see what the person was doing via Access Log reporting. The issue is, it’s not reproducible (if it’s a POST), I can’t just retry the command, and with coding practices, I can’t just cut/paste my log output (as that would be a GET, and in the even of a POST, the code would ignore it). In addition, it doesn’t fully show what the user is actually seeing on the screen.

The problem is what I really want to see is the actual screen the user is seeing. In essence, I want to log the HTTP response of every HTTP request and attach this to my logging, therefore, for every URL requested, I can actually see the rendered HTML output. There are some minor limitations, of post processing Javascript for example (an AJAX application would not really work here), and I’d definitely want to catch the response from the application server and not on the client (who knows what additional work a browser might do to your code).

So the delimma, is how can I achieve this currently using Apache Tomcat. I’m assuming a generic option would not present itself independent of application servers.

My search continues.

What do you want in a job?

Well it’s a good question, as part of sorting out tax documentation I came across the following list (go figure why it was with tax stuff).
I made this list sometime mid 2002, I remember specifically the job I made the list for to evaluate. I’ve yet to take adequate time to completely review each point carefully and revise, however it’s a good starting point to look back at what you thought previously.

This list is not a list of what I want in a job, more a checklist of what I would like, and at the end I can get a score of how well the job fits to these items.

  1. Use of Existing Core Skills – Technologies
    • Oracle, Unix, Java, HTML/Web
    • Open Source Projects (Apache – HTTP/Java/XML, My SQL, PHP, etc)
  2. Use of Existing Core Skills – Experience
    • Database Modelling
    • Large Systems Design & Development
    • Application Performance Tuning
    • Technical Problem Solving
  3. About Newer Technologies
    • Encourage/Embrace use
    • Not bleeding edge
    • Opportunity for learning new relative skills
  4. Work Environment
    • Equipment
    • Location
    • Core working hours
    • Work at home options
  5. Team Environment
    • Ability to undertaking varying roles (and not undertake others due to other team members)
    • Existing working relationship with people
    • Across Development/Management/Support Teams
  6. Project Environment
    • Durations – 1 month – 1 year
    • Challenging
    • Varying
    • Problem Solving
  7. Project Methodologies
    • Emphasis on Quality Procedures
    • Emphasis on Software Testing
    • Emphasis on Customer
  8. Renumeration
  9. Other
    • Some Fun

Linux Format Reader Awards 2006

The Linux Format magazine is having it’s annual reader awards in a number of categories.

These include (I’ve include my picks after each category):

You can nominate at http://www.linuxformat.co.uk/nominate/. Nomimations close Friday 10 Feb 2006.

What's in a Wiki?

I’ve been wondering what exactly is a wiki? It’s also not the first time I’ve been asked myself. Here is an explaination I came across.

A wiki, from the Hawaiian term for “quick,” is an ongoing, ever-evolving, organized compilation of information.

The theory behind wikis is simple: One know-it-all is not enough; more is better. Unlike other Web sites created and managed by a single person or entity, wikis are truly for the people, by the people. But the model has also come under fire, underscoring the danger of wikis: it’s open to everyone, including pranksters and those with an ax to grind or want to revise history in their favor.

Wiki’s enable a mass collaboration to build all kinds of things. It is becoming a new model for doing things on the Internet. The software is not magic, you can’t throw up a wiki and hope a miracle will occur. It takes thoughtful people to make it run.

Wikimania is growing dramatically. In October, 16.3 million people visited Wikipedia, a 267 percent increase from the same month last year, according to Nielsen/NetRatings. Overall, the site has compiled 2.6 million articles, including 840,000 in English.