Digital Tech Trek Digest [#Issue 2024.08]

The One Billion Row Challenge Shows That Java Can Process a One Billion Rows File in Two Seconds

Well, it’s way under 2 seconds for the 1brc.The published results are in and if you’re good you can read 1 billion data points of weather data and analyze it. The final best number as per the article release is “00:00.323″. Yes, that answer is in milliseconds “Result (m:s.ms)”. Mind-blowing.

ScyllaDB Summit 2024

Last week I attended this virtual event. All the presentations can be found online. I had never used the product before so while some new features like Tablets were not as applicable in understanding the full impact, the comparisons of DynamoDB performance and cost were very applicable.

So what is ScyllaDB? It is a distributed NoSQL DBaaS that speaks Cassandra protocol (do large companies still use this?), and it speaks AWS DynamoDB protocol. That is really interesting to me. You can choose a Cloud Hosted offering, or if you’re into managing your setup, you can use the Open Source ScyllaDB version available from GitHub. I started at ScyllaDB University to get a grip on the basics. I have yet to try the local Docker Compose setup.

Thanks also to the team for the swag which I received.

Playing a game with your CI/CD pipeline

My friend Sergey has created a game in GitLab called GitTerra. Drop a few lines into your .gitlab-ci.yml and each build will give you a generated 3D map of a city based on your commit. I really look forward to some of his next steps, leveraging potentially different colors for languages, or different building structures for artifacts found in your commit.

We raised 11.6M to build Serverless Postgres for Modern SaaS

Congrats to Gwen and her co-founder for getting seed funding for Nile Serverless Postgres for Modern SaaS. Awesome news for an entrepreneur and I’m very hopeful for the success of Nile.

The Safest Way to Test Postgres Destructive Queries

While I am a user of ElephantSQL serverless PostgreSQL and Neon, Nile and Xata are just a few that are competing in the space. With multiple other products also speaking PostgreSQL protocol, you can easily trial a small product in an RDBMS in the cloud at no cost. PostgreSQL is definitely outdoing MySQL in this space. You have the extensive set of NoSQL Cloud offerings, SycllaDB I just mentioned, and D1 by CloudFlare I have yet to try this branching feature for your database, sounds interesting and I’ve added to my just as long list of products to try, as books to read. Nit: It’s PostgreSQL, not Postgres.

About “Digital Tech Trek Digest”

Most days I take some time early in the morning to scan my inbox newsletters, the news, LinkedIn, or other sources to read something new covering professional and personal topics of interest. Turning what I read into some actionable notes in a short committed time window is a summary of what I learned, what I should learn and use, or what is of random interest.

Some of my regular sources include TLDR, Forbes Daily, ThoughWorks Podcasts, Daily Dose of Data Science and BoringCashCow. Also Scientific American Technology, Fareed’s Global Briefing, Software Design: Tidy First? by Kent Beck, Last Week in AWS, Micro Newsletter to name a few.

Digital Tech Trek Digest [#Issue 2024.03]

Lessons from going freemium: a decision that broke our business

As an entrepreneur always considering how to produce a sustaining passive revenue, what licensing model to use, and how to acquire and retain customers, the allure of a freemium model is ever present in so many offerings. You may wish to read this article and look at the visualizations provided with the narrative. I found this a useful data storytelling example.

The allure of seeing a new product is the strongest motivator new users have to complete setup. If you make onboarding too easy, they’ll never come back to do the hard task you let them skip.

Read more at Lessons from going freemium: a decision that broke our business Source: TLDR

Newsletters and online content creators

Lenny’s Newsletter from the prior article, listed with over 574,000 subscribers is one of several Substack newsletters I subscribe to. Substack is described as “The subscription network for independent writers and creators”. I have been collecting the number of subscribers from several newsletters I follow, however, there is no way to see that growth over time. Also missing are the price rates over time and the ratio of free to paying subscribers. Random Idea: What is missing is a history of this information. Other stats I’ve noted previously include 66,000 subscribers for Kent Becks Newsletter with 3 subscription plan offerings, 1,250,000 subscribers for the free TLDR (I can remember this years ago being much less) and 65,000 subscribers for the Seattle Data Guy newsletter.

FWIW this post from Lenny’s Newsletter This newsletter is growing up is from 2020.

Golden Kitty Awards 2022

I came across the Golden Kitty Awards which unfortunately are only current to 2022 (fail on being current). It was interesting to scan the list for innovative ideas. I’ve yet to visit any sites, but I’m always encouraged by what people think of and commit to building regardless of the motivation or incentive. What counts is an entrepreneur takes an idea and releases a product. 

Source: Random

Streamer JS – Video stream layout manager for OBS Studio and other streaming applications.

I am a new user of Twitch streaming for personal projects.  My good friend Sergey Chernyshev organizer of the large New York Web Performance Group has created Streamer JS as a means to drive more dynamic content in the browser and with common languages of HTML/CSS/Javascript and using OBS more as the streaming only component.  One objective is to have better version control management of assets/scenes/sources/filters/etc.  It’s interesting that  PouchDB is an eventually consistent distributed datastore in Javascript. Yet another simple data store to review for suitability. 

Source: Word of mouth

The 37th Chaos Communication Congress (37C3) by the Chaos Computer Club

Last month I was introduced to the Chaos Computer Club. This is a large German-based annual tech conference with a focus on security & infrastructure/hacking.  Over 100 talks from the most recent event last week have been posted here.

Source: Word of mouth

About ‘Digital Tech Trek Digest’

Most days I take some time early in the morning to scan my inbox newsletters, the news, LinkedIn, or other sources to read something new covering professional and personal topics of interest. Turning what I read into some actionable notes in a short committed time window is a summary of what I learned today, what I should learn and use, or what is of random interest.

Some of my regular sources include TLDR, Forbes Daily, ThoughWorks Podcasts, Daily Dose of Data Science and BoringCashCow. Also Scientific American Technology, Fareed’s Global Briefing, Software Design: Tidy First? by Kent Beck to name a few.

Digital Tech Trek Digest [#Issue 2024.02]

Indie Newsletter Tool Generates $15,000 a Month

There are so many different email newsletter sites you could wonder if there is market saturation. MailChimp, Mailgun, ConvertKit, Sendgrid (now part of Twilio it seems), Moosend and Mailersend come to mind.

It seems the space still has plenty of revenue-producing options including buttondown.email reportedly a side gig generating $15k per month. Source: BoringCashCow

When I asked a good friend and author of the Technical SEO Weekly his use of ConvertKit directed me to this Baremetrics Dashboard which is another product to look at sometime.

LLMs and Programming in the first days of 2024

How do use an LLM? If you are still on the fence start getting into the habit of using it more frequently then start. I now use ChatGPT and Claude AI daily, and with a crowded market there are many other emerging technologies to also consider.

I use ChatGPT for coding and image generation with DALL.E. I use Claude more for reviewing large documents that seem to be ideal for producing a summary, or to generate a fictitious movie script from those documents.

I do not like Javascript nor do I wish to actually learn this language however I write it daily via ChatGPT. Javascript is the ever-changing technology of web development and it’s impossible to keep up with the next product, or version of a product you may know. ChatGPT helps me navigate this combined with asking for HTML and TailwindCSS.  However, it’s not perfect, you need to be an experienced engineer who has learned how to write code for many years to ask the right questions and to correct the LLM when it does not produce what you expect. Let’s look at CSS. Now there is flex and grid and it’s hard to keep up with changing features that browsers support. This is where ChatGPT has helped me. I have been using Tailwindcss but it still took an expert friend 30 minutes to help me debug a CSS formatting issue of a future OBS Twitch streaming project to correctly size the content all in a 1920×1080 box. I learned a lot of new features of Google Chrome Developer Tools Inspector I did not know and are probably just the start of expert debugging features.

Until a few months ago I never knew it’s now much easier to read JSON in Javascript.

async function fetchData() {
  try {
    const response = await fetch('data.json');
    const data = await response.json();
    console.log(data);
    return data
  } catch (error) {
    console.error('Error fetching data:', error);
  }
}

let data = await fetchData();

I’d like to remind users that  ChatGPT can make mistakes. Consider checking important information.. Source:  TLDR

ParadeDB (GitHub Repo)

Every day there is another PostgreSQL product to review.  I am a current user of ElephantSQL which I didn’t know existed two months ago. Neon and Tembo are two more PostgreSQL serverless-related products on my product review list.  Now adding ParadeDB as well as reading Thoughts on PostgreSQL in 2024.

About ‘Digital Tech Trek Digest’

Most days I take some time early in the morning to scan my inbox newsletters, the news, LinkedIn, or other sources to read something new covering professional and personal topics of interest. Turning what I read into some actionable notes in a short committed time window is a summary of what I learned today, what I should learn and use, or what is of random interest.

Some of my regular sources include TLDR, Forbes Daily, ThoughWorks Podcasts, Daily Dose of Data Science and BoringCashCow to name a few.

Moving to using dash ‘-‘ as a delimiter

Spaces or tabs. Using 2 spaces verses 4 spaces. Defining variables as lowercase v InitCap, environment variables UPPERCASE or not, using underscore ‘_’ or dash ‘-‘ as a separator. These are all un-winnable discussions among your friends and colleagues. Side Note: What is critical for any business is there is a well-defined and enforced standard, no matter what that standard it.

I have while scripting in Bash always used underscore ‘_’, for commands or internal functions. I have probably taken on this because environment variables have IMO always used underscore, e.g. looking at my laptop now I have for example (TERM_PROGRAM, TEE_HOME, AWS_DEFAULT_REGION).

However in recent months I have changed my coding practices within scripting to always using the dash ‘-‘. What was list_clusters, became list-clusters for example.

Why? There is just one reason; the bane of millions of effective typists and coders; the QWERTY keyboard. The dash ‘-‘ requires the use of just one finger, while the underscore requires two.

Sometimes the simpler solution is well simpler.

The agile software development lifecycle responsibility

The eXtreme Programming (XP) methodology places emphasis on a number of core principles for agile software development. These include (and are not limited to) the planning game, short and frequent iterations, testing, frequent refactoring, continuous integration, ownership and standards.

Identifying the problem

These core principles however are not the full lifecycle of software development. This is really only a portion of the lifecycle. What is lacking is the definition for the ongoing responsibility and ownership by the creators of software in the sustainability of said software for the lifetime of use and benefit to an organization.

An agile methodology approach (of which XP is just one) fails to expose and describe the full operational cost within development, testing and deployment. Just as a single line of code is viewed a hundred times more than the time is was written, the usage of that code in the full lifecycle of an organization is potentially a magnitude more investment of time and resources.

Software development is not just about new feature creation. It is also ensuring full product ownership and responsibility consistently. It is also ensuring that in a larger organization, compatibility and consistency can occur with other products. In other words, it is thinking of software for the whole organization, rather than the sum of individual parts.

Scheduling lifecycle management time

Development and engineering resources already apportion time between planning, development and unit testing. There needs to be a second more important consideration. An apportionment of time between product features, product stability and product maintainability.

A good assignment of time to cover the full lifecycle adequately is:

  • 60% of time to feature design, development and product support (i.e. bugs)
  • 20% of time in stability and sustainability management of the existing technology stack (i.e. refactoring and testing)
  • 20% of time in overall lifecycle management of delivered functionality (i.e. ongoing ownership)

Conveniently this Pareto allocation can be seen as 80% for development time and 20% for time generally considered operations.

Sustainability Management

Remember the core principles of XP that included frequent refactoring and standards. How much time is spent on refactoring code to provide a better, more consistent, more testable codebase for an application after code is initially deployed? What about across multiple applications in your organization. Engineering resources rarely invest any time let alone actively scheduling time for code maintenance by the entire engineering organization, yet there are immediate benefits. It can be amazing how more performant a system is when unnecessary code is simply deleted from software that has gradually evolved over time. The compounding benefits can mean less code to view by developers and thus adding incremental efficiency. Less code to deploy also means smaller installation and application footprint. Particularly when the code is unnecessarily executed in the common usage path.

Engineering teams in general are more focussed on delivering new functionality or fixing issues with newer functionality rather than reviewing existing functionality for optimization, consolidation, replacement or removal. What about applying an improvement to not just one application, but multiple applications across an organization whenever possible.

There is generally at least one individual at each organization that has the attitude of “Do I write the line of code, or is there a better way?”, and “What code can be deleted as it is no longer (or was never) used?”. If all engineers considered, evaluated and implemented these concepts as a daily process, code would be more stable, it would be more lean. Does your organization have a recognition for the developer that has deleted the most lines of code from your production system?

The following is the example of a single developers improvement to a production system via deletions.
github deletions

Are there better ways of implementing functionality with the version of the technology stack already in use? Many times a newer version of software is used for one feature, but what other new or improved features also exist. This is a proactive measure to look at the features of the technology in use. This is a different type of refactoring, but the same concept in code reduction. A great example here is the use of an iterator design pattern rather than a loop. In initial deployment of an application, memory optimization may not have been obvious, however over time and increasing datasets this simple proactive action has a larger benefit for the application.

A final step in improving sustainability of the software is testing. An agile approach introduces unit testing, but testing do not stop with the validation of a single line of code. Testing encompasses how that functions with the entire system, often known as functional testing. Systems often require load testing to know the capacity before failure, not after it occurs. If as much time was spent in these two additional areas of testing, as was spent in unit testing, more robust systems would exist and the unseen benefit is the productivity to spent more time developing.

Here are a few customer examples of refactoring. Unfortunately this is an all to common occurrence.

Module bloat

An assessment of the technology stack for a newly deployed application (i.e. just a few weeks old) showed a long list of PHP and Apache modules. Without any justification as to why these modules were used, and without a willing engineering sponsor it took quite some time to first produce automated deployment duplicating this custom environment, than applicable testing to strip out what was ultimately unnecessary. The overall outcome had multiple effects. What was needed to operate the system was actually documented. What was needed was actually automated to assist in future deployments. The resulting software was more performant as it had less baggage. The resulting deployed VM image was actually over 1GB smaller after all bloat was removed. This improved the time to deploy new application servers. As this system had a very large scale up and scale down weekly, we are talking 1000% at peak times, the impact of a more lean stack had a huge impact on the true deployment times of the application. This is an attribute that can be difficult for developers to appreciate, when comparing a development environment to a production system.

This entire process and the large investment of work would have been almost non-existent if this was part of the engineering methodology used during initial development (which took over one year for initial deployment), and if more (or all) individual developers stopped to ask why are we adding additional modules. This is part of the infrastructure planning that should have a feedback loop within each iteration. This also requires both a solid experience in engineering and architectural oversight to be able to estimate the impact over a much larger time period than the development cycle.

Framework bloat

An education based client faced a huge problem. The existing system had grown over a number of years, the engineering department had grown from one developer to over a dozen developers, yet the approach towards software development had not changed from that single developer original module based Drupal approach for a small application. With sales for the next annual education cycle already 4x more than the current user base that was having regular outages, the system could not (and would not) sustain known future sales.

Often the first question asked by clients in this situation when offering performance services is “How can I scale my system 10x?” I generally counter this question with “How did you scale from when your system was 10x smaller to now?”. Aside from the interesting conversations around these responses, I often need to explain that performance is about efficiency, and this often requires a cultural change. I also generally quote one of my popular lines — “When reviewing the performance of a piece of code (or SQL statement); the first objective should not be to make it better; the first objective should be to eliminate it.” This is also generally received with blank stares and silence. Efficiency it seems perhaps is no longer taught or practiced.

As with most simple yet profound assessments an example of the clients production system can best demonstrate what inefficiency is. An analysis of the user registration process unveiled alarming result. This analysis that can happen in a very short period, e.g. an hour. In summary, 50 SQL statements were executed to register a new user to the system. A physical desk check (again foreign when you have to ask multiple people how do I print out something as a visiting consultant) of just the database access showed that with the present inefficient Drupal ‘node’ schema design, just 11 SQL statements were actually needed to complete the required task. That is, the code could be 500% more efficient and nothing has been tuned or scaled. The client needed at least a 400% immediate improvement. However, just explaining this did not convince the organizations c-level executives to reset poor development practices to addressing immediate and ongoing scalability (i.e. success of your startup). They wanted a more abstract approach, they wanted a magically sharded solution were simply throwing H/W (and $) at the problem made it go away without changing the engineering mindset. If you go back to the answer to my response question you find this is often the solution to get to the current point, that is add more servers, add caching, add read-only data access. This is not actually the solution but is adding complexity to the problem and making it more expensive to correct. In the startup ecosystem this is also known as a successful catastrophe. You reached all of your marketing and sales pitch goals, and your software crumpled under your unplanned success.

Was this problem just in user registration, or was it throughout the entire application? If looking at one common and frequent code path a 500% improvement can be made with 0% feature impact. Would that not indicate the problem exists elsewhere in the codebase. In fact, this example product was not even the classic RAT v CAT that is often a more compounding performance issue.

Further assessment of this one code path demonstrated that when an optimal schema design was architected for the purpose of the application, the number of SQL statements would be reduced to 5 (i.e. a 900% improvement). This is a significant performance and scalability benefit when using applicable architectural design and strategic planning. Performing regular architectural reviews by skilled resources in your business strategy can help to address development productivity regression long before they occur. A great architect never sees the true benefits of their work. It is a silent reward that their given experience, knowledge and expertise has an unknown financial value to an organization.

Lifecycle Management

It can be difficult to understand the impact of code in the full lifecycle of a software product in the 21st century. Until individuals have seen the birth, growth, support, longevity and death of a system it can be impossible to understand the impact some lines of code have with one application and the interoperability requirements with other applications. When the waterfall approach for SDLC was still in active use this was possible with large scale projects over time. In the post tech boom age and with the use of agile methodologies the incremental development lifecycle hides a lot of important context for better assessment of true cost savings.

The introduction and increasing popularity of the devops and site reliability roles also attempts to hide what many large organizations and successful website have, that is a dedicated operations team. Tools have done so much to enable engineers to be more productive. Automated provisioning, PaaS and CI/CD tools seamlessly enable more (abstract) code to be written to provide that essential functionality to the end user. Automated testing has replaced design documents. Organizations developer systems without is a data model? All of these tools and techniques however do not replace the intelligence needed to operate a system over time, particularly for tasks including upgrades and integrations.

One simple concept can be implemented to assist in all contributors owning lifecycle management.

The first is the responsibility of a developer being paged when a production problem occurs due to the line of code they wrote. Being responsible accepting that in the early morning or weekend you may be needed to address a problem attributed to your individual work and a failure within an entire system may make the decision to consider the larger impact more prevalent. This is taking the XP principle of ownership and defining the time dimension to a period infinitely greater than the present iteration.

The following is a great tweet that shows this developer has heard of commenting their code, but not considering lifecycle management?

// When I wrote this, only God and I understood what I was doing
// Now, God only knows

Justifying the reallocation of time

In the 1990s the concept of adding a quality step to software development via means of code reviews and automated testing was seen as an impediment to productivity. This potential cost in lost productivity could not be justified. Why would developers write tests when there is an entire QA team to test new features each time the software is released? Today it is seen as an essential component for continuous integration and delivery and the testing is designed to test all functionality repeatedly, not just new functionality.

Assigning 40% of present development time elsewhere could be viewed as a loss of productivity because today projects do not have a start and end date and deliverables where a total cost of ownership could be more clearly calculated. Today, projects are a continual ongoing evolution, even the concept of cost projection simply does not exist and therefore could be stated as impossible to validate against. After more than a decade of working with startups at many stages of evolution, the cost of not undertaking stability and lifecycle management is a far greater longer term cost to an organization by an outside observer. Look no further than the much larger turnover of technology staff in today’s organizations. These resources have institutional knowledge that is lost to the organization. This information is rarely documented as a historical artifact and the reason why steps were taken cannot be inferred from what is presently the state of the current code (or even reviewing the code revision history). This cost is rarely calculated within the software development lifecycle.

Adopting ownership

Many organizations suffer from the clash of traditional infrastructure principles with the pace of accelerated innovation. This approach helps to better balance the responsibility particularly between engineering and operations departments and improves the workflow to producing better products to the business in the longer term and ultimately to those who matter, the customer.

When developers value the total impact of a line of code in the full lifecycle of the product or service, a different mindset leads to actually writing better code. This code results in being more efficient and the carryover effect is the developer is actually more effective at writing more subsequent code.

Loops in shell scripting

If you are die hard Bourne Shell (/bin/sh) scripter, it can be a challenge not to be enticed by the syntax niceties of the Born Again Borne Shell (/bin/bash).

One example is the {..} syntax

#!/bin/bash
for I in {0..5}
do
   echo $I
done
0
1
2
3
4
5

This syntax is not valid in /bin/sh on Linux.

#!/bin/sh
for I in {0..5}
do
   echo $I
done
{0..5}

NOTE: However apparently it does work in Mac OS X, which is derived from BSD, not Linux.

/bin/sh gives you a for loop but it requires the full list of iterated values instead of a range.

#!/bin/sh

for I in 0 1 2 3 4 5
do
  echo $I
done

Note: Passing a string does not work by default.

#!/bin/sh

for I in "0 1 2 3 4 5"
do
  echo $I
done

The approach to product the same result requires some format management.

#!/bin/sh

OIFS=$IFS
IFS=" "
for I in `echo "0 1 2 3 4 5"`
do
  echo $I
done
IFS=$OIFS

You can use while

#!/bin/sh

I=0
while [ $I -le 5 ]
do 
  echo $I
  I=`expr $I + 1`
done

You can use one of several other shell commands, in this example awk

#!/bin/sh

for I in `awk 'BEGIN{for (i=0;i<=5;i++) print i}'`
do 
  echo $I
done

Or, the function specifically design for sequences of numbers seq

#!/bin/sh

for I in `seq 0 5`
do 
  echo $I
done

And for these few examples, there will be more possibilities to achieve close to feature parity of the /bin/bash syntax.
An example found on BSD is jot - 0 5. This is not available Ubuntu by default but installed with the athena-jot package. However the syntax is then different for correct usage.

What is testing?

In software development this is a simple question. What is [the purpose of] testing? If asked to give a one sentence answer what would you say? I have asked this simple question of attendees at many presentations, and also to software developers I have worked with or consulted to.

The most common answer is. “Testing is about making sure the software works, the function your testing does what it should, for example saves the information you entered”.

Unfortunately this is not the purpose of testing, and this attitude leads to what I generally term as poor quality software. “Testing is about trying to break your product any way possible, all the time.”

With this clarification in understanding of a basic and necessary software engineering principle, the attitude towards software development and the entire focus and mindset of engineering and quality assurance can change for the better.

Another very simple example which I often ask when consulting. What does your website look like when it’s down? Again, the general answer is often vague and/or incomplete. How do you know when your website is down? I have heard the response “The users will let you know”. You may laugh, but it is certainly not funny. Show me your website in a down state? Show me your website in a degraded state? When the answer is either unclear, or with a recent employment the same response, there has simply been little thought into producing a quality product by a testing process that is intent on breaking your software.

What procedures do you follow when receiving alerts about errors? What procedures do you put in place to ensure they do not happen again? Again, one has to be disappointed when the response is, “I will set up an email alert to the team for this type of error?” This reactive response is not addressing the problem, only acknowledging the existence of a problem. What is needed is being proactive. Was a bug raised? Can the problem be easily reproduced? How was the problem fixed the first time? Can this be corrected in the code? Can the interim resolution be automated?

When there is a negative user experience from any type of failure or error another important feedback loop is the post-mortem to review the when, why, how and who of the situation and to create a plan to ensure this does not happen again.

Testing needs to baked in to everything that is done, and practice makes for a more perfect outcome. In a high volume environment it is critical to have a simulated environment where you can benchmark performance of any new release for any regressions. A well defined load testing environment can be used to review experimental branches of possible performance improvements. It is also where you can determine the bottleneck and breaking point as you increase load 2X, 5X, 10X. It is impossible to be proactive when your system can fail at 2X load, and the engineering resources needed to implement a solution will not happen in time.

Disaster is inevitable. It will happen, whether small or large. Hardware and software inherently fails. How it fails and what is done to mitigate this to ensure the best possible consistent and rewarding consumer experience is only possible by consistently practicing to break your software at all stages in the development and deployment lifecycle.

Why is my database slow?

Not part of my Don’t Assume series, but when a client states “Why is my database slow””, you need to determine if indeed the database is slow.

Some simple tools come to the rescue here, one is Firebug. If a web page takes 5 seconds to load, but the .htm file takes 400ms, and the 100+ assets being downloaded from one base url, then is the database actually slow? Tuning the database will only improve the 400ms portion of 5,000ms download.

There some very simple tips here. MySQL is my domain expertise and I will not profess to improving the entire stack however perception is everything to a user and you can often do a lot. Some simple points include:

  • Know about blocking assets in your <head> element, e.g. .js files.
  • Streamline .js, .css and images to what’s needed. .e.g. download a 100k image only to resize to a thumbnail via style elements.
  • Sprites. Like many efficient but simple SQL statements, network overhead is your greatest expense.
  • Splitting images to a different domain.
  • Splitting images to multiple domains (e.g. 3 via CNAME only needed.) — Hint: Learn about the protocol
  • Cookieless domains for static assets
  • Lighter web container for static assets (e.g. nginx, lighttpd)
  • Know about caching, expires and etags
  • Stripping out http://ww.domain.com from all your internal links (that one alone saved 12% of HTML page size for a client). You may ask is that really a big deal, well in a high volume site the sooner you can release the socket on your webserver, the sooner you can start serving a different request.

Like tuning a database, some things work better then others, some require more testing then others, and consultants never tell you all the tricks.

References

As with everything in tuning, do your research and also determine what works in your environment and what doesn’t. Two excellent resources to start with are Steve Souders and Best Practices for Speeding Up Your Web Site by Yahoo.

Testability

If I was to provide one tip for organizations on how to implement a successful technology solution, I would state you need to ensure your product/software/system is completely testable. Independent on how you elect to test your system, the design of creating a completely testable infrastructure will enable exponential savings as your business grows.

You achieve this by implementing an Application Programming Interface (API) for all data access. Your goal should be to move away from technology dependence and towards a technology agnostic solution, your dependency is now your business specification. This does not mean you are going to expose this API to the Internet, your own applications are your first clients, your web site and your management reporting tools. Your website is just a client presentation of your most valuable asset, your information.

Creating an environment that enables you test and verify your information independently from how is renders in a browser, enables a complete level of possible automation for testing this component of your communication channel. While end to end testing is also necessary, this becomes more complex and is impractical if this is your only means of testing. The principle of any popular Agile methodology approach is around testing where one popular term is Test Driven Development (TDD). While you may not implement TDD, knowing and applying the principals enables testability.

As you continue to grow, you will realize you now have the infrastructure and ability to stress test your most important system features. It is a common misconception that testing is about ensuring your software works as designed. Testing should not be about what works, but what doesn’t break. The goal of testing should be to break your software. The ability to stress test your system is to know when your system will fail. This ability to predict can benefit you ahead of time. You do not want your startup to suffer a successful catastrophe where you meet all your marketing goals, but you system crashes, and while the “Twitter failed whale” is frustrating, this is one approach attempt to mediate a total failure.

More woes with java version on Ubuntu

Armed with more information on Drizzle JDBC being a JDBC 4.0 implementation (helps to explain my issues in Getting started with Drizzle JDBC) I took the time to read about some other new JDBC 4.0 features.

There was reference to handling chained exceptions, however when trying to get this working for SQLException was more complex on Ubuntu 9.04 then I anticipated.

My first problem was an apparent source level problem.

$ javac ExampleDrizzle.java
----------
1. ERROR in ExampleDrizzle.java (at line 14)
	for(Throwable e : sx ) {
	    ^^^^^^^^^^^^^^^^
Syntax error, 'for each' statements are only available if source level is 1.5

That’s weird, what java version was I running now I’d changed with update-alternatives –config java yesterday.

$ java -version
java version "1.6.0_16"
Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)

No issues here, a quick man reference gives me:

-1.5                    set compliance level to 1.5

I try that, and well that fixes one problem, but creates another.

$ javac -1.5 ExampleDrizzle.java
----------
1. ERROR in ExampleDrizzle.java (at line 14)
	for(Throwable e : sx ) {
	                  ^^
Can only iterate over an array or an instance of java.lang.Iterable

Now Class SQLException 1.6 javadocs shows SQLException as implementing the generics Iterable<Throwable>, while 1.5 javadoc does not. I guess I need to use 1.6 then.

$ javac -1.6 ExampleDrizzle.java
Annotation processing got disabled, since it requires a 1.6 compliant JVM
----------
1. ERROR in ExampleDrizzle.java (at line 14)
	for(Throwable e : sx ) {
	                  ^^
Can only iterate over an array or an instance of java.lang.Iterable

Wait a minute, I’m using a 1.6 compliant JVM. Double checking

$ ls -al /etc/alternatives/java*
lrwxrwxrwx 1 root root 36 2009-09-17 18:53 /etc/alternatives/java -> /usr/lib/jvm/java-6-sun/jre/bin/java
lrwxrwxrwx 1 root root 46 2009-09-17 18:53 /etc/alternatives/java.1.gz -> /usr/lib/jvm/java-6-sun/jre/man/man1/java.1.gz
lrwxrwxrwx 1 root root 31 2009-09-17 17:50 /etc/alternatives/javac -> /usr/lib/jvm/java-gcj/bin/javac
lrwxrwxrwx 1 root root 41 2009-09-17 17:50 /etc/alternatives/javac.1.gz -> /usr/lib/jvm/java-gcj/man/man1/javac.1.gz
lrwxrwxrwx 1 root root 33 2009-09-17 17:50 /etc/alternatives/javadoc -> /usr/lib/jvm/java-gcj/bin/javadoc
lrwxrwxrwx 1 root root 43 2009-09-17 17:50 /etc/alternatives/javadoc.1.gz -> /usr/lib/jvm/java-gcj/man/man1/javadoc.1.gz
lrwxrwxrwx 1 root root 31 2009-09-17 17:50 /etc/alternatives/javah -> /usr/lib/jvm/java-gcj/bin/javah
lrwxrwxrwx 1 root root 41 2009-09-17 17:50 /etc/alternatives/javah.1.gz -> /usr/lib/jvm/java-gcj/man/man1/javah.1.gz
lrwxrwxrwx 1 root root 33 2009-09-11 10:06 /etc/alternatives/javap -> /usr/lib/jvm/java-6-sun/bin/javap
lrwxrwxrwx 1 root root 43 2009-09-11 10:06 /etc/alternatives/javap.1.gz -> /usr/lib/jvm/java-6-sun/man/man1/javap.1.gz
lrwxrwxrwx 1 root root 39 2009-09-11 10:06 /etc/alternatives/java_vm -> /usr/lib/jvm/java-6-sun/jre/bin/java_vm
lrwxrwxrwx 1 root root 38 2009-09-11 10:06 /etc/alternatives/javaws -> /usr/lib/jvm/java-6-sun/jre/bin/javaws
lrwxrwxrwx 1 root root 48 2009-09-11 10:06 /etc/alternatives/javaws.1.gz -> /usr/lib/jvm/java-6-sun/jre/man/man1/javaws.1.gz

javac is not using Sun Java 6. I have no idea how that happened, but it explains now the problem, should be checking javac version, not java version.

$ javac -version
Eclipse Java Compiler 0.894_R34x, 3.4.2 release, Copyright IBM Corp 2000, 2008. All rights reserved.

What the? I was writing Java code on this server by hand, but decided last night to install eclipse after the fact. Did this affect this. I’m not certain whether I installed eclipse before or after my work last night.

I try to change the alternatives again.

$ sudo update-alternatives --config java

There are 4 alternatives which provide `java'.

  Selection    Alternative
-----------------------------------------------
*         1    /usr/lib/jvm/java-6-sun/jre/bin/java
          2    /usr/bin/gij-4.3
          3    /usr/bin/gij-4.2
 +        4    /usr/lib/jvm/java-gcj/jre/bin/java

Press enter to keep the default[*], or type selection number: 1
Using '/usr/lib/jvm/java-6-sun/jre/bin/java' to provide 'java'.

$ javac -version
Eclipse Java Compiler 0.894_R34x, 3.4.2 release, Copyright IBM Corp 2000, 2008. All rights reserved.

That doesn’t work. One needs to know that java and javac operate independently.

$ sudo update-alternatives --config javac

There are 4 alternatives which provide `javac'.

  Selection    Alternative
-----------------------------------------------
          1    /usr/lib/jvm/java-6-sun/bin/javac
          2    /usr/bin/ecj
          3    /usr/bin/gcj-wrapper-4.3
*+        4    /usr/lib/jvm/java-gcj/bin/javac

Press enter to keep the default[*], or type selection number: 1
Using '/usr/lib/jvm/java-6-sun/bin/javac' to provide 'javac'.
$ javac -version
javac 1.6.0_16

$ javac ExampleDrizzle.java

Buyer beware with Ubuntu and it’s rather messed up implementation approach toward alternative java JVM’s.

Getting started with Drizzle JDBC

In preparation for some Java work I wanted to configure and test the Drizzle JDBC Driver. Any chance to swing Drizzle into a MySQL discussion is worth the research. What I found was an issue compiling and an issue running on Ubuntu 9.04

You can start by downloading and building the Drizzle JDBC. My first problem was when I tried to build a usable .jar. I got errors in the test cases which caused by default no built .jar to work with. I raised Bug #432146 – org.drizzle.jdbc.MySQLDriverTest Tests fail. As I stated it may not be a real bug, but it seems at present that you require a running MySQL instance as well as a running Drizzle instance. In my case I didn’t have MySQL running, and I think to be fair, I should be able to build a Drizzle driver without MySQL.

Anyway, as per the Wiki Docs I proceeded to package without successful test cases. My next problem was more interesting, and perhaps found earlier from the tests?

I first created a test schema my code was going to use.

$ ~/drizzle/deploy/bin/drizzle
Your Drizzle connection id is 724
Server version: 2009.09.1126 Source distribution (trunk)

drizzle> create schema test_java;
Query OK, 1 row affected (0 sec)
drizzle> exit

I wrote a simple Java program.

$ cat ExampleDrizzle.java
import java.sql.*;

public class ExampleDrizzle {

  public static void main(String args[]) {

    try {
      Class.forName("org.drizzle.jdbc.Driver");
    } catch (Exception e) {
      System.out.println(e.getMessage());
      System.exit(1);
    }

    try {
      Connection con = DriverManager.getConnection("jdbc:drizzle://localhost:4427/test_java");
      Statement st = con.createStatement();
      st.executeUpdate("CREATE TABLE a (id int not null primary key, value varchar(20))");
      st.close();
      con.close();
    } catch (SQLException e) {
      System.out.println(e.getMessage());
    }
  }
}

Compiled.

$ javac ExampleDrizzle.java

Ran.

$ java ExampleDrizzle
org.drizzle.jdbc.Driver not found in gnu.gcj.runtime.SystemClassLoader{urls=[file:mysql-connector-java-5.1.8-bin.jar,file:./], parent=gnu.gcj.runtime.ExtensionClassLoader{urls=[], parent=null}}

Oops, been a while since using Java. I was amazed I could write the code in vi in the first place.

$ export CLASSPATH=drizzle-jdbc-0.5-SNAPSHOT.jar:.
$ java ExampleDrizzle
17-Sep-09 6:48:45 PM org.drizzle.jdbc.internal.drizzle.DrizzleProtocol 
INFO: Connected to: localhost:4427
Exception in thread "main" java.lang.NoClassDefFoundError: org.drizzle.jdbc.DrizzleConnection
   at java.lang.Class.initializeClass(libgcj.so.90)
   at org.drizzle.jdbc.Driver.connect(Driver.java:74)
   at java.sql.DriverManager.getConnection(libgcj.so.90)
   at ExampleDrizzle.main(ExampleDrizzle.java:15)
Caused by: java.lang.ClassNotFoundException: java.sql.SQLFeatureNotSupportedException not found in gnu.gcj.runtime.SystemClassLoader{urls=[file:drizzle-jdbc-0.5-SNAPSHOT.jar,file:./], parent=gnu.gcj.runtime.ExtensionClassLoader{urls=[], parent=null}}
   at java.net.URLClassLoader.findClass(libgcj.so.90)
   at java.lang.ClassLoader.loadClass(libgcj.so.90)
   at java.lang.ClassLoader.loadClass(libgcj.so.90)
   at java.lang.Class.forName(libgcj.so.90)
   at java.lang.Class.initializeClass(libgcj.so.90)
   ...3 more

Hmmm, that’s disappointing. I thought about it a minute, figured some guidance would be beneficial , so I sought out the best Java person on #drizzle IRC. Getting a name, but no response from an initial inquiry after about a half hour I thought again at the problem. Just what java are you using?

$ java -version
java version "1.5.0"
gij (GNU libgcj) version 4.3.3

$ ls -l /usr/bin/java
lrwxrwxrwx 1 root root 22 2009-07-17 12:36 /usr/bin/java -> /etc/alternatives/java

$ sudo find / -name java
[sudo] password for rbradfor:
/usr/lib/java
/usr/lib/ure/share/java
/usr/lib/jvm/java-6-sun-1.6.0.16/bin/java
/usr/lib/jvm/java-6-sun-1.6.0.16/jre/bin/java
/usr/lib/jvm/java-1.5.0-gcj-4.3-1.5.0.0/bin/java
/usr/lib/jvm/java-1.5.0-gcj-4.3-1.5.0.0/jre/bin/java
/usr/bin/java
/usr/include/c++/4.3/gnu/java
/usr/include/c++/4.3/java
/usr/local/include/google/protobuf/compiler/java

$ ls -l /etc/alternatives/j*
...
lrwxrwxrwx   1 root root    33 2009-09-17 17:50 jar -> /usr/lib/jvm/java-gcj/jre/bin/jar
lrwxrwxrwx   1 root root    39 2009-09-17 17:50 jar.1.gz -> /usr/lib/jvm/java-gcj/man/man1/jar.1.gz
lrwxrwxrwx   1 root root    35 2009-09-17 17:50 jarsigner -> /usr/lib/jvm/java-gcj/bin/jarsigner
lrwxrwxrwx   1 root root    45 2009-09-17 17:50 jarsigner.1.gz -> /usr/lib/jvm/java-gcj/man/man1/jarsigner.1.gz
lrwxrwxrwx   1 root root    34 2009-09-17 17:50 java -> /usr/lib/jvm/java-gcj/jre/bin/java
lrwxrwxrwx   1 root root    40 2009-09-17 17:50 java.1.gz -> /usr/lib/jvm/java-gcj/man/man1/java.1.gz
lrwxrwxrwx   1 root root    31 2009-09-17 17:50 javac -> /usr/lib/jvm/java-gcj/bin/javac
lrwxrwxrwx   1 root root    41 2009-09-17 17:50 javac.1.gz -> /usr/lib/jvm/java-gcj/man/man1/javac.1.gz
...

I wonder if I should use the real Sun Java.

$ sudo apt-get install sun-java6-jdk
Reading package lists... Done
Building dependency tree
Reading state information... Done
sun-java6-jdk is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.
$ sudo update-alternatives --config java

There are 4 alternatives which provide `java'.

  Selection    Alternative
-----------------------------------------------
          1    /usr/lib/jvm/java-6-sun/jre/bin/java
          2    /usr/bin/gij-4.3
          3    /usr/bin/gij-4.2
*+        4    /usr/lib/jvm/java-gcj/jre/bin/java

Press enter to keep the default[*], or type selection number: 1
Using '/usr/lib/jvm/java-6-sun/jre/bin/java' to provide 'java'.

$ ls -l /usr/bin/java
lrwxrwxrwx 1 root root 22 2009-07-17 12:36 /usr/bin/java -> /etc/alternatives/java
$ ls -l /etc/alternatives/java
lrwxrwxrwx 1 root root 36 2009-09-17 18:53 /etc/alternatives/java -> /usr/lib/jvm/java-6-sun/jre/bin/java

Yep, it took a minute to discover the update-alternatives command, lucky I didn’t try that manually.

A second try.

$ javac ExampleDrizzle.java
$ java ExampleDrizzle
Sep 17, 2009 6:54:22 PM org.drizzle.jdbc.internal.drizzle.DrizzleProtocol 
INFO: Connected to: localhost:4427
Sep 17, 2009 6:54:22 PM org.drizzle.jdbc.internal.drizzle.DrizzleProtocol close
INFO: Closing connection
Sep 17, 2009 6:54:22 PM org.drizzle.jdbc.internal.common.packet.AsyncPacketFetcher run
INFO: Connection closed

$ ~/drizzle/deploy/bin/drizzle test_java
Server version: 2009.09.1126 Source distribution (trunk)

drizzle> show tables;
+---------------------+
| Tables_in_test_java |
+---------------------+
| a                   |
+---------------------+
1 row in set (0 sec)

drizzle> desc a;
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| id    | int         | NO   | PRI | NULL    |       |
| value | varchar(20) | YES  |     | NULL    |       |
+-------+-------------+------+-----+---------+-------+
2 rows in set (0 sec)

And I’ve got a working testcase.

CommunityOne East – An open developer conference

With an opening video from thru-you.com – an individual taking random you-tube video and producing video mashup’s, the CommunityOne East conference in New York, NY beings.

The opening introduction was by Chief Sustainability Officer Dave Douglas. Interesting job title.

His initial discussion was around what is the relationship between technology and society. A plug for his upcoming book “Citizen Engineer” – The responsibilities of a 21st Century Engineer. He quotes “Crisis loves an Innovation” by Jonathan Schwartz, and extends with “Crisis loves a Community”.

He asks us to consider the wider community ecosystem such as schools, towns, governments, NGO’s etc with our usage and knowledge of technology.

Event: CommunityOne East in New York, NY.
Author: Ronald Bradford

Project Darkstar

It may sound like either a astronomical research project or a Star Wars spin- off, but Project Darkstar is an open source infrastructure from Sun Microsystems that states “simplify the development and operation of massively scalable online games, virtual worlds, and social networking applications.”

The advertising sounds promising like many sites, the emphasis seems to be on gaming throughout the material, interesting they threw in the term “social networking applications” specifically in opening descriptions.

I believe worthy of investigation, if only to see how that solve some classic problems. So, Learn some more, Start your rockets and Participate.

A summary introduction to Agile

Agile Development Methodology: – Most popular Implementations: Extreme Programming (XP), SCRUM, Crystal

Links

Books Highly Recommended

Extreme Programming Explained Extreme Programming Pocket Book More books on my Library page.

Tutorial – Beginner Web Services

An introduction to using Axis.

What is Axis?

Axis is essentially a SOAP engine — a framework for constructing SOAP processors such as clients, servers, gateways, etc. The current version of Axis is written in Java. But Axis isn’t just a SOAP engine — it also includes:

  • a simple stand-alone server,
  • a server which plugs into servlet engines such as Tomcat,
  • extensive support for the Web Service Description Language (WSDL),
  • emitter tooling that generates Java classes from WSDL.
  • some sample programs, and
  • a tool for monitoring TCP/IP packets.

Pre-Requisites

Installation

su -
cd /opt
wget http://apache.ausgamers.com/ws/axis/1_4/axis-bin-1_4.tar.gz
tar xvfz axis-bin-1_4.tar.gz
ln -s axis-1_4/ axis
echo "AXIS_HOME=/opt/axis;export AXIS_HOME" > /etc/profile.d/axis.sh
. /etc/profile.d/axis.sh
cp -r $AXIS_HOME/webapps/axis $CATALINA_HOME/webapps
catalina.sh stop
catalina.sh start

At this time, you should be able to confirm this installation was initially successful by going to http://localhost:8080/axis/

Installed Axis Options

The default Axis page, gives you a number of options. To confirm the installation, select the Validate Axis Link http://localhost:8080/axis/happyaxis.jsp. If there is anything missing this page will report it. In my case I was missing XML Security, which is optional.

cd /tmp
wget http://xml.apache.org/security/dist/java-library/xml-security-bin-1_3_0.zip
unzip  xml-security-bin-1_3_0.zip
cp xml-security-1_3_0/libs/xmlsec-1.3.0.jar /opt/tomcat/common/lib
catalina.sh stop
catalina.sh start

One of the links from the default home page are http://localhost:8080/axis/servlet/AxisServlet which Lists services.

First Use

One of the nicest parts of AXIS is its “instant Web service” feature called Java Web Service (JWS) — just take a Java file, rename it, and drop it into TOMCAT_HOME/webapps/axis to make all of the (public) methods in the class callable through Web services.

Quote.java

import java.util.HashMap;
import java.util.Map;

public class Quote {
  private HashMap quotes = null;
  public Quote() {
    quotes = new HashMap();
    quotes.put("Groucho Marx", "Time flies like an arrow.  Fruit flies like a banana.");
    quotes.put("Mae West", "When women go wrong, men go right after them.");
    quotes.put("Mark Twain", "Go to Heaven for the climate, Hell for the company.");
    quotes.put("Thomas Edison", "Genius is 1% inspiration, 99% perspiration.");
  }
  public String quote(String name) {
    String quote;
    if (name == null || name.length() == 0
      || (quote = (String) quotes.get(name)) == null) {
      quote = "No quotes.";
  }
  return (quote);
  }
  public int count() {
    return quotes.size();
  }
}
cp Quote.java /opt/tomcat/webapps/axis/Quote.jws

http://localhost:8080/axis/Quote.jws
http://localhost:8080/axis/Quote.jws?wsdl

More details can be found at Getting Started using Web Services with Tomcat and Axis.

What’s Next

In my next Tutorial, I’ll be moving to the practical use of Web Services using WSDL.

References

When is a batch job successful?

Simple enough question, and it’s a simple enough answer. When the batch job/process in question successfully completes what it is designed to do and not in error.

I’m attempting to test, integrate and document some developed code on a client site, and well, I’m disgusted. (as with most things, is an accumulation of a number of things that lead to these frustrations.)

The process is broken down into two parts, lets call these X and Y. Now Y is the most stable part of a long standing product, it’s the API calls to the database. X does some pre-processing, then calls Y, then reports back success/failure.

Simple enough, and these are batch processes run after hours, so operators that don’t have the business knowledge need to know success or failure.

I’ll set aside for the moment that the calling process (which is indeed a shell script wrapper around the Java code) returns a status 0-Success and 1-Failure. This is practically useless because even when X fails, it doesn’t necessary report that (another story, but part of the same frustration)

I’ve extracted a small portion of the XML response that is returned from Y, that is then inteperated by X.

...
 <Status>
  <Code>0</Code>
  <Description>Success</Description>
  <DateTime>2006-09-26T16:03:45</DateTime>
 </Status>
 <Result>
   <OutputFileName>Not.Real.Name.Output.txt</OutputFileName>
 </Result>
...
...
<Severity>
  <Code>FATAL</Code>
    <Description>A fatal error was encountered while processing. See the reason code and description for further details.</Description>
</Severity>
<Reason>
 <Code>50</Code>
 <Description>XML exception: XML parse error on line: 9, position: ....
</Reason>
...

I’m not being told, “Oh, that’s a problem”. I’m been attempted to be convinced that it’s not an error, it is success.

Well, I don’t know from what planet you have lobbed in from, but in by book, FATAL is FATAL. Check out Handling Error Levels in Logging.

What’s the most depressing is I’m expected to hand this over to the customer for testing. My job isn’t actually testing, it’s integration and documentation for the end user, but the level of quality has demanded that I test it onsite before passing on. Well, I’m not going to give this to the customer, which makes it hard when the developers (who are on the same team as me) don’t see this as a problem.

PS: The list of articles of this nature has grown to the point, I’ve created my own “The Daily WFT” category. I’ve had a lot of stories I’ve never written about, perhaps I’ll pen a few more now.

Handling Error Levels in Logging

In reviewing some provided code to a client, I observed a number of actions contray to generally accepted practices regarding logging. This is what I provided as the general programming conventions with regardings to logging.

Using Log4J (http://logging.apache.org/log4j/docs/), which is generally accepted as the benchmark for all Java applications. This provides the following logging levels.

  • FATAL
  • ERROR
  • WARN
  • INFO
  • DEBUG
  • TRACE – from 1.2.12, latest is 1.2.13

A description for what handling should occur per logging level.

FATAL. As the name suggests, all processing should stop. Should logging include a FATAL, the process is a Failure.

ERROR. An error has occured, and this requires attention, and action. Generally processing should stop, however additional post processing, or an alternative path could occur. Should logging include an ERROR, the process is a Failure.

WARN. Something that is unexpected occured, however it doesn’t affect the general processing from succeeding successfully. If a process includes WARN and not FATAL/ERROR it should be considered successful.

INFO. Information Only. On high volume systems, this level of logging may even be turned off. This generally indicates key information values or steps, and can assist when enabled in longer running processes to identify where a process is. You don’t log errors at INFO.

DEBUG. For Debugging Purposes only.

Ok, well that sounds like common sense. Here is what I observed on a client site.

  • Code logs a FATAL, but continues processing
  • A FATAL is logged, yet the calling process reports success
  • An ERROR is logged, yet the calling process reports success.
  • A lot of WARN are logged, and this is misleading, as it appears more information regarding XML elements not processed (We are talking 20+ Warnings per batch process). From what I’ve observed, these don’t require futher action, and should be changed in INFO.
  • Errors are being logged as INFO. A NullPointer RunTime Exception is logged as INFO. If an error provides an Exception argument where a stack trace is printed, it ain’t an INFO message.

Securing a Tomcat Webapp – Part 2

If you wish to password protect your webapp with an Apache .htaccess type authentication model, you require two configuration steps. The first within your WEB-INF/web.xml, add the following replacing rolename appropiately.

  <security-constraint>
    <web-resource-collection>
      <web-resource-name>All Pages</web-resource-name>
        <url-pattern>*.htm</url-pattern>
        <url-pattern>*.html</url-pattern>
   </web-resource-collection>
    <auth-constraint>
       <role-name>rolename</role-name>
    </auth-constraint>
  </security-constraint>

  <!-- Define the Login Configuration for this Application -->
  <login-config>
    <auth-method>BASIC</auth-method>
    <realm-name>Test Application</realm-name>
  </login-config>

  <!-- Security roles referenced by this web application -->
  <security-role>
    <description>
      The role that is required to log in to the Application
    </description>
    <role-name>rolename</role-name>
  </security-role>

Second, within the tomcat $CATALINA_HOME/conf/server.xml, you need to define the Realm used within the appropiate host’s <Engine> definition.

  <Realm className="org.apache.catalina.realm.UserDatabaseRealm" debug="0" resourceName="UserDatabase"/>

This Realm connects with a known resource, which I define with the $CATALINA_HOME/conf/server.xml <GlobalNamingResources> definition.

<Resource name="UserDatabase" auth="Container"
          type="org.apache.catalina.UserDatabase"
          description="User database that can be updated and saved">
</Resource>
<ResourceParams name="UserDatabase">
    <parameter>
        <name>factory</name>
        <value>org.apache.catalina.users.MemoryUserDatabaseFactory</value>
    </parameter>
    <parameter>
        <name>pathname</name>
        <value>conf/custom/users.xml</value>
    </parameter>
</ResourceParams>

NOTE: The use of MemoryRealm has limited uses. Tomcat provides 5 different Realm implementations including JDBC, DataSource,JINDI, Memory and JAAS.

And of course you need to define your user authentication within the appropiately defined users file. In this case conf/custom/users.xml

Securing a Tomcat Webapp

If you require a webapp to always run in https mode using a SSL key, then you need to add the following to your WEB-INF/web.xml configuration.

 <security-constraint>
    <web-resource-collection>
        <web-resource-name>jsp</web-resource-name>
        <url-pattern>*.htm</url-pattern>
        <url-pattern>*.html</url-pattern>
    </web-resource-collection>
    <user-data-constraint>
        <transport-guarantee>CONFIDENTIAL</transport-guarantee>
    </user-data-constraint>
</security-constraint>

Mercurial Version Control Software

I got asked (being a Java developer) about what was involved in creating an Eclipse Plugin for Mercurial. Well in true Google style, why invent when somebody probably already has. A quick check finds Mercurial Eclipse by VecTrace.

Now until last week, I’d never heard of Mercurial, so this is really an introduction to somebody that has no idea.

What is Mercurial?

Mercurial is a fast, lightweight Source Control Management system designed for efficient handling of very large distributed projects.

Ok, so big deal, I use CVS. I also use Subversion (SVN) for my Apache contributions, and also for MySQL GUI products. Why do we need another Version Control Product? Mercurial is a Distributed Software Configuration Management Tool. The following is from the Mercurial Wiki

A distributed SCM tool is designed to support a model in which each Repository is loosely coupled to many others. Each Repository contains a complete set of metadata describing one or more projects. These repositories may be located almost anywhere. Individual developers only need access to their own repositories, not to a central one, in order to Commit changes.

Distributed SCMs provide mechanisms for propagating changes between repositories.

Distributed SCMs are in contrast to CentralisedSCMs.

So, clearly a distributed model would well in a large distributed organisation where external factors limit continous access to a single central repository. Low Bandwidth, Poor Internet Connectivity, being on a plane, and travelling are all things that would make a distributed model a more ideal solution. I know I’ve taken my laptop away, and being an “Agile Methodology” developer, I commit often. When you have several days of uncommitted work it goes against the normal operation.

You can get more information at the official website at http://www.selenic.com/mercurial. A few quick links are: Quick Start Guide, Tutorial, Glossary.

Installing Mercurial

su -
cd /src
wget http://www.selenic.com/mercurial/release/mercurial-0.9.tar.gz
tar xvfz mercurial-0.9.tar.gz
cd mercurial-0.9
# NOTE: Requires python 2.3 or better.
python -V
python setup.py install --force

A quick check of the syntax.

$ hg
Mercurial Distributed SCM

basic commands (use "hg help" for the full list or option "-v" for details):

 add        add the specified files on the next commit
 annotate   show changeset information per file line
 clone      make a copy of an existing repository
 commit     commit the specified files or all outstanding changes
 diff       diff repository (or selected files)
 export     dump the header and diffs for one or more changesets
 init       create a new repository in the given directory
 log        show revision history of entire repository or files
 parents    show the parents of the working dir or revision
 pull       pull changes from the specified source
 push       push changes to the specified destination
 remove     remove the specified files on the next commit
 revert     revert files or dirs to their states as of some revision
 serve      export the repository via HTTP
 status     show changed files in the working directory
 update     update or merge working directory

A more detailed list:

$ hg help
Mercurial Distributed SCM

list of commands (use "hg help -v" to show aliases and global options):

 add        add the specified files on the next commit
 annotate   show changeset information per file line
 archive    create unversioned archive of a repository revision
 backout    reverse effect of earlier changeset
 bundle     create a changegroup file
 cat        output the latest or given revisions of files
 clone      make a copy of an existing repository
 commit     commit the specified files or all outstanding changes
 copy       mark files as copied for the next commit
 diff       diff repository (or selected files)
 export     dump the header and diffs for one or more changesets
 grep       search for a pattern in specified files and revisions
 heads      show current repository heads
 help       show help for a given command or all commands
 identify   print information about the working copy
 import     import an ordered set of patches
 incoming   show new changesets found in source
 init       create a new repository in the given directory
 locate     locate files matching specific patterns
 log        show revision history of entire repository or files
 manifest   output the latest or given revision of the project manifest
 merge      Merge working directory with another revision
 outgoing   show changesets not found in destination
 parents    show the parents of the working dir or revision
 paths      show definition of symbolic path names
 pull       pull changes from the specified source
 push       push changes to the specified destination
 recover    roll back an interrupted transaction
 remove     remove the specified files on the next commit
 rename     rename files; equivalent of copy + remove
 revert     revert files or dirs to their states as of some revision
 rollback   roll back the last transaction in this repository
 root       print the root (top) of the current working dir
 serve      export the repository via HTTP
 status     show changed files in the working directory
 tag        add a tag for the current tip or a given revision
 tags       list repository tags
 tip        show the tip revision
 unbundle   apply a changegroup file
 update     update or merge working directory
 verify     verify the integrity of the repository
 version    output version and copyright information

Mercurial Eclipse Plugin

The plugin is still in it’s early days, but the “FREEDOM” of open source enables me to easily review. After a quick install and review of docs, I shot off an email to the developer, stating why I was looking, and while I have other projects on the go, I asked what I could do to help. It’s only be 2 days and we have already communicated via email several times on various topics. That’s one reason why I really love the Open Source Community. Generally people are very receptive to feedback, comments and especially help.

Within Eclipse

  • Help ->Software Updates-> Find and install…
  • Select “Search for new features to install”, click Next
  • Click “New Remote site…”
  • Enter following details and click Ok
    • Name: MercurialEclipse Beta site
    • URL: http://zingo.homeip.net:8000/eclipse-betaupdate/
  • Follow the prompts to accept the license and download.

So now with Eclipse, on a project you can simply go Right Click -> Team -> Share Project -> Select Mercurial

A Quick Mercurial Tutorial

Of course the quickest way to learn about using Mercurial is to look at an existing product. So taking this plugin project for a spin.

$ cd /tmp
$ hg clone http://zingo.homeip.net:8000/hg/mercurialeclipse com.vectrace.MercurialEclipse
$ hg clone com.vectrace.MercurialEclipse example
$ cd example
# Create some new dummy files
$ touch test.txt
$ touch html/test.html
# View files against respository status
$ hg status
? html/test.html
? test.txt
# Add the new files
$ hg add
adding html/test.html
adding test.txt
# Commit changes
$ hg commit -m "Testing Mercurial"

So other then the second clone command (which enabled me to not mess up the original repository and to test distributed handling next), this is just the same as CVS (checkout, diff, add, commit)

# The Distributed nature involves first Pulling from the "upstream" respository
$ hg pull ../com.vectrace.MercurialEclipse
pulling from ../com.vectrace.MercurialEclipse
searching for changes
no changes found
# Confirm our new file is not in "upstream" respository
$ ls ../com.vectrace.MercurialEclipse/test.txt
ls: ../com.vectrace.MercurialEclipse/test.txt: No such file or directory
# Push local respository changes to "upstream" respository
$ hg push ../com.vectrace.MercurialEclipse
pushing to ../com.vectrace.MercurialEclipse
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 2 changes to 2 files
$ ls ../com.vectrace.MercurialEclipse/test.txt
ls: ../com.vectrace.MercurialEclipse/test.txt: No such file or directory

Hmmm, missed something here. The Quick Start Guide docs seems to want to work back into the original respository, pulling in changes from the “downstream” repository, then executing a merge and commit command. I don’t want to do this just in case I start messing up the original repository. Time for me to create a new standalone repository so I don’t screw anything up. Stay tuned for an updated tutorial.

Of course this is very basic, I need to look into commands used in CVS like login, tagging (versioning), branching, diffing revisions and merging but this is a start.

I do have a concern with the number of distributed respositories, when and how do you know you are in sync with the “master”, indeed what is the “master”. It does merit some investigation to see what management is in place, like identifying all respositories, and comparing for example.

Conclusion

Ok, so now I’ve got a grasp on Mercurial, time to review the Java Code and see what works and what doesn’t in the Eclipse environment. Of course I don’t use Mercurial so what I may consider as functionality required, may be lower priority to those users out there. Any feedback I’m sure will be most appreciated by the original developer.

Generating an internal SSL Certificate (for tomcat)

How to Generate an internal SSL certificate

Create the self-signed keystore

$ su -
$ URL="your.url.here";export URL
$ cd /opt/tomcat/conf
$ keytool -genkey -alias ${URL} -keyalg RSA -keystore ${URL}.keystore
Enter keystore password:  changeit
What is your first and last name?
  [Unknown]:  your.url.here
What is the name of your organizational unit?
  [Unknown]:  IT
What is the name of your organization?
  [Unknown]:  your.url.here
What is the name of your City or Locality?
  [Unknown]:  Brisbane
What is the name of your State or Province?
  [Unknown]:  QLD
What is the two-letter country code for this unit?
  [Unknown]:  AU
Is CN=your.url.here, OU=IT, O=your.url.here, L=Brisbane, ST=QLD, C=AU correct?
  [no]:  yes

Enter key password for <your.url.here>
        (RETURN if same as keystore password):

Turn the keystore into a X.509 certificate

$ keytool -export -alias ${URL} -keystore ${URL}.keystore -rfc -file ${URL}.cert
Enter keystore password:  changeit
Certificate stored in file <your.url.here.cert>

Delete existing trusted certificate

$ keytool -delete -alias ${URL} -file ${URL}.cert  -keystore /opt/java/jre/lib/security/cacerts  -storepass changeit

Import the certificate into cacerts – JRE trusted certificates

$ keytool -import -alias ${URL} -file ${URL}.cert  -keystore /opt/java/jre/lib/security/cacerts  -storepass changeit
Owner: CN=your.url.here, OU=IT, O=your.url.here, L=Brisbane, ST=QLD, C=AU
Issuer: CN=your.url.here, OU=IT, O=your.url.here, L=Brisbane, ST=QLD, C=AU
Serial number: 44ab628c
Valid from: Wed Jul 05 01:56:12 CDT 2006 until: Tue Oct 03 01:56:12 CDT 2006
Certificate fingerprints:
         MD5:  EC:76:01:04:7F:FC:21:CC:A8:41:AD:86:C8:B2:D5:6D
         SHA1: 2D:FD:7C:56:65:70:36:1B:1D:71:09:41:84:98:E6:8E:89:18:BC:18
Trust this certificate? [no]:  yes
Certificate was added to keystore


If you replaced an existing certificate you will need to restart Tomcat.

Guidelines for managing embedded external project dependencies

I’ve yet to find any Java project that doesn’t have dependancies on some other Open Source external libraries. I’ve yet to find a Java project that manages these external dependencies appropiately for support and integration at an enterprise level.

As with most projects, understanding an applying sound principles that scale will help you at a later date, and generally the cost of implementation is minimual at the start, but of course becomes more expensive when it’s really needed. The classic case is Version Control. For over 10 years, even on small single developer projects, I’ve used Version Control, it should be taught at university as an introduction to good programming design, it would greatly benefit software development and maintenance.

Back onto the topic of hand. Let’s use a moderate Java Web Based application, and for the purposes of this discussion the following Open Source external libraries are incoporated. Log4J, JUnit, Canoo WebTest, MySQL JDBC, Apache Commons (Collections, DHCP, Pool, HTTPClient, Taglibs Mailer). I could continue, but this will suffice for the demonstration.

It’s very easy for your project to include the appropiate jar’s such as (log4j.jar, junit.jar, commons-pool.jar etc), however this is where support and integration with other products fall down.

A Controlled Approach

You need to keep a seperate repository (under source code control of course) of your external libraries, and this becomes the source across all corporate projects. This is to include the following for each library:

  • The actual deployed jar
  • The matching source code of the deployed jar
  • Java Documentation of deployed jar

Versioning

Log4J is an example of an Open Source project that does version their jars, will many other open source projects do not. Why don’t they? Well one reason is to enable people to upgrade easily but simply overriding existing versions, and processes that have specific CLASSPATH’s are not affected. Generally today, implementations of software include all jar’s within a specified directory so I don’t see the problem.
Log4J gives in this example a log4j-1.2.12.jar for deployment purposes. When libraries do not include a version number, the are to be specifically added. This adds another small delemma of standards. The general practice is to use the hyphen ‘-‘. followed by the product version using the full-stop ‘.’, however there are projects that don’t follow this.

Version Recording

So now we have for each external library, an appropiately versioned jar, and matching source and documentation. This is the initial baseline. What’s needed is a simple HTML index that manages this information for use. The Index should include:

  • Product Name
  • Product URL
  • Repository Version
    • Version Number
    • Version Date
    • Download Date
  • Latest Version
    • Version number
    • Date
    • Comment

You may ask, why do you record the Latest Version, when the practice should be to always get the latest version. JUnit is a good example, the present version 4.x, requires a JDK 1.5.x deployment, and if your application is running only 1.4.x, then you can only use older 3.8.x versions.s mentioned earlier,

Management

Having an index of external libraries is one thing, correct use and management is the most important step. Let’s assume we have taken the time to download and document the required libraries from our example, and everything has been deployed into our first project.

Now, a 2/3 month task of checking for updated versions can be scheduled. Withing this process, newer versions can be downloaded and recorded appropiately. In our example, Log4J now has 1.2.13. Updating the external libraries repository is the simple part, the next step is to notify all coporate projects of the new version, and to encourage uptake. This may not always occur in a timely manner, but with at least this baseline in place and when there are issues, standardisation on the known coporate version is the first step.

Dependencies

Within each project libaries, a readme that details which versions of which external jars are included andwhen they were last updated from the repository should be done. Noting this information with the both the external libraries repository and the project repository provides a paper trail. In addition, should there be any exceptions, this is the place this information can be reported.

External Projects

Canoo WebTest is a good example of an external project that also includes other external libraries such as Log4J, JUnit, HtmlUnit, NekoHtml.
Problems arise when these products may use and implement older or unknown versions of libraries.

Internal Projects

Having internal projects that are dependent on other internal projects is nothing new in a large corporate enterprise. The problem arises when a spaghetti of undocumented dependancies causes a management nightmare. Let’s take this real life example.

Product A has included jars of Product B, Product C, Product D and Product E. The Product C actually has a different version of Product D embedded within in. Product D which now is included twice also includes Product E, so there are now three copies of this, all are different in size, and all a binary only with no version numbers, and no corresponding matching source code. Does this sound bad? Well it is. How it ever worked is still amazing.

This mess could have been managed first with Version Control (a basic 101 in software development), and an appropiate management of external libraries, and a similar approach to internal libraries.

An Example

This is a great example to highlight the cost in lack of appropiate management. I was supporting an existing large scale project (1000+ users) (let’s call this Product A), and the integration of a new project (let’s call this Product B) had been passed from the development team for implementation, testing and release. A threaded process, it would simply just hang after some initialisation, no notice, no errors, just nothing. Not withstanding that something should have been better reported for the errors. Due to 7 possible log files between the software application and the application server, nothing was reported, but that’s another topic.

The final result was Product B had introduced the use of org.apache.log4j.Logger.trace(), a new more granular logging then debug(). The appropiate Log4J jar had been included in the product, and this was Version 1.2.12. However, Product A, which was using Product B, was bundled with an earlier version of Log4J, Version 1.2.8, and this version didn’t support this new method.

While it took a few hours of debugging to find this problem, it was made easier because at least these jar’s were version, of the 20-30 jars across products only 3-4 were versioned. Similar problems with QName and XMLBeans unamed jars prior to this took days to resolve (indeed one had to be worked around as it couldn’t be resolved).

A further complication in this process was when Product B was introduced. This was developed and built under Linux, while Product A was still being maintained under Windows. From the experience of Integration is was found that the order of loading within the classloader of a commerical application server differed between operating systems.

What constitutes a good error message to the user?

Today, will go down in my professional history as quite possibly the lowest I would ever think of a software developer. I’ve carefully avoided the term “fellow coder”, speaking of a IT industry sticking by fellow IT people, but not today.

I presently support an existing production system, some 1000+ users that’s been in place around 3 years in which I’ve had no prior involvement until recently. Integration with other external parties within the system have provided an esclation in errors being reported in this external communication, and lack of adequate feedback to the user is another topic. Email is the means of reporting administratively of user errors, again another topic of issue. Within these emails, which are almost impossible to manage due to the limited internal GUI only toolset and lack of access to actual email account files to automate scripting (yet another topic? Do you see a trend here), is some relevent information regarding the transaction, and then most importantly the error message. The thing I need to investigate the problem.

The line reads (removing some stuff):


Error Code: 234567892

Ok, well a little cryptic, but surely you can work out from the external system what this means. Investigation of some more errors, in the mail GUI product, yet another series of open windows (you can’t view messages like a regular mail client with a summary list and a detail panel), provides for a trend in the error messages:


Error Code: 1235492567
Error Code: -434783134
Error Code: 34345199

The trend being there is none. Of course today by mid morning the email error count is into the hundreds, and I’m none the wiser. Well time to closely investigate the code management (as I’ve already contacted the external party, and asked if I can provide some error codes to receive greater information).

The following constitutes the two lines of code taken in the determination of the error messages I’ve shown so far. Note, this code takes the external system response, and then “attempts to determine usefull error content for presentation back to the user”.


errorNo = new Random().nextInt();
error = “Error Code: ” + errorNo;

Now while everybody laughed out loud, including fellow developers, DBA, IT manager, the Business Owners and Users (which can’t read code but could understand the first of these two lines I highlighted), and yes it really was funny in context with a bigger picture, but really it wasn’t funny at all. Some things make my blood boil, and this was one of them. With all the time lost between multitudes of people, users, call centre etc, I’d never felt a stronger conviction to hunt down the developer that wrote this.

The end of the story is after even trolling old CVS repository entries I was unable to piece sufficient information to determine the author. Most likely done pre version control, and then that trail leads to only a few names I’ve heard mentioned before.

I’d like to see somebody top that one!

FireFox's Live Bookmarks

I’m just about to launch a new project I’ve been working on in the past week. It has a RSS feed, and I wanted to ensure that within FireFox, this could be picked up as a live bookmark. This alone is a very cool feature. Using my Blog for reference, the following code is used.


	<link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="http://blog.arabx.com.au/?feed=rss2" />
	<link rel="alternate" type="text/xml" title="RSS .92" href="http://blog.arabx.com.au/?feed=rss" />
	<link rel="alternate" type="application/atom+xml" title="Atom 0.3" href="http://blog.arabx.com.au/?feed=atom" />

So adding a RSS2 live bookmark link is as simple as a header tag.

References
Live Bookmarks – Mozilla Description
Good Introduction Tutorial

How I work

My work life is really fragmented at present, so I’ve decided a split approach in answer to Dave Rosenberg’s How I Work–what I have learned so far .

What is your role?
Support Developer providing sole support an internal client web based system (Java, Oracle) with 1000+ users producing >$1m daily revenue. Independent Database Consultant. Specialising in Database Modelling, Large Systems Design and Web Development Technologies. Strong proponent of Agile Development Methodologies (specifically Extreme Programming – XP).
What is your computer setup?
Dell Optiplex GX280 (P4, 1GB, 17″ CRT) running Redhat FC4.
(When I started only 2 months ago I got lumped with a similar speced machine running Windows XP, and upteen windows apps (like 4 different versions of TOAD for example) . This OS was contry to my understanding provided in the interview process.
Primary – Dell Inspiron 5150 (P4/3.2GHz, 1GB, 120GB, 15″ + 21″ CRT)
Secondary – Generic (P3/600MHz, 1GB, lots of HDD ~500GB, 17″ CRT)
Both with very impressive Saitek backlit keyboards (one red, one blue) Great at night. See image at bottom.
What desktop software applications do you use daily?
RedHat Fedora Core 4
Eclipse 3.1, FireFox 1.5, Thunderbird 1.5, WebSphere Application Server 5.1, J2DK 1.4.2, Oracle SQL Developer, Open Office 2, XMMS and SSH client (which I use most)
Due to legacy internal systems and support I also must run under Wine (Internet Exploder), and First Class (email client).
Not to stop there, I also must run under Citrix ICA Client apps (FocalPoint, Heat Call Logging, and Microsoft Word for internal forms that won’t work under OO2.) And before somebody suggests why not try VMWare or other clients, I have tried, but software like Focalpoint can’t install?
CentOS 4.3
FireFox 1.5, ThunderBird 1.5, Gaim, SSH, Skype, Open Office 2.
Maybe not all of these every day, but some combination of each day –> MySQL 4.1, MySQL 5.0, MySQL 5.1, MySQL Workbench, Eclipse 3.1, J2DK 1.4.2, J2DK 5.0, Apache Tomcat 5.0.28, Apache Httpd 2.0.53, JMeter.

Presently also configuring a new laptop drive running Ubuntu Dapper 6.06 RC. For the Record, Beta 2, Flight 6 and Flight 7 all failed.
What websites do you visit every day?
Internal Wiki (all day)
At Lunch Google Sci-Tech News
PlanetMySQL
Google Sci-Tech News
PlanetMySQL (if I haven’t already got to them).
What mobile device or cell phone do you use?
N/A NEC e606 3G phone (which I’ve had for probably 3 years now) operating on a 3G network with video calls, again for 3 years.
Do you use IM?
No. All access is blocked other then an internal Jabber server, that I use rarely, never for communication, just a cut/paste of command or syntax. Speaking of blocked, SSH access to my production servers is blocked, and even reading news like Skypes Call Out is blocked by WebSense. Extensively, however due to current employement policy, I’m very hamstrung unless before/after hours.
I use Gaim as I have AIM, MSN and Google Talk accounts, and Skype. My preference was always AIM, but as clients come and go, I’ve had to accumulate more accounts.
Do you use a VoIP phone?
No. Not at present, however for many years I worked with US clients and used Packet 8. Still have the hard phone somewhere.
I’ve also used Skype talk for one or one or conference calls. Of late in Australia to New Zealand and Singapore. Indeed, the quality to Singapore has been excellent, when living in the US, calls to Singapore on Skype were clearer then my Packet8 phone to US numbers.
Do you have a personal organization/time management theory?
Current contract employees a number of disjointed methods, which in observation just shows so much inefficency, it’s worth documenting just to highlight what not to do. We have daily team meetings (10 mins), each listing your top 2 daily tasks. Weekly we have to also submit weekly goals. Weekly combined meetings with another team where we again give weekly top 2 tasks. We use two seperate systems (with manual double entry) for work identification, one for call centre logging issues, and Bugzilla for software bugs, and enhancements. We also use XPlanner (again duplicating a lot of tasks) for time management.
With all this rigid structure, I am daily given either other work to do, or investigate, and in over 3 months, I would rarely end a week anywhere near where it was so described at the start of the week.
With all the technology possible, I actually do not have any electronic management gadget, never had. I use a combination of notebook, plain paper (usually for daily notes etc, which I discard regularly), a diary, and normally a lot of emails which I normally send to/from home.
Given that email is used so much, I basically use Draft Emails for any electronic notes.

Anything else?
Perhaps there is merit in How I work now, and How I’d like to work now.

Saitek Keyboard

The GWT!


New to the AJAX vertical space is the Google Web Toolkit (GWT) released the the Sun Java One Conference last week.

AJAX (“Asynchronous Javascript and XML”) isn’t new, infact the underlying requirements within AJAX, the DHTML, DOM manipulation and XMLHttpRequest were available in 1997. In fact, I implemented functionality to perform what AJAX does back in the late 90’s, probably starting 1999, using solely Javascript, and some of that is still in use today on at least one of my sites. Of course Google made this functionality popular with it’s use in Google Suggest a few years ago.

So what is GWT? An extract from the Google Web Toolkit Web Page.

Google Web Toolkit (GWT) is a Java software development framework that makes writing AJAX applications like Google Maps and Gmail easy for developers who don’t speak browser quirks as a second language. Writing dynamic web applications today is a tedious and error-prone process; you spend 90% of your time working around subtle incompatibilities between web browsers and platforms, and JavaScript’s lack of modularity makes sharing, testing, and reusing AJAX components difficult and fragile.

GWT lets you avoid many of these headaches while offering your users the same dynamic, standards-compliant experience. You write your front end in the Java programming language, and the GWT compiler converts your Java classes to browser-compliant JavaScript and HTML.

Restyling a Mediwiki Installation – Lesson 1

Following my implementation of UltimateLAMP, read heaps more at this thread, I undertook to provide customizations of a MediaWiki Installation. Here is the first lesson that you can undertake if you wish to beautify the default MediaWiki Installation.

For the purposes of this demonstration, I am going to help out Jay & Colin and propose a restyle the MySQL forge to fit in with the default Home Page. Hopefully you will see it there soon!

Lesson 1 – Updating the default Monobook.css

There are several different ways to make style changes, the simplest is to customize the system wide Monobook.css, and this Lesson will focus on this.

By accessing the link [http://my.wiki.site/]index.php/MediaWiki:Monobook.css you will be able to make the following changes.

The best way to approach this, like any good programming style, make small changes, testing and confirmation and continue.

Note: For all screen prints, click for a larger image

1. Cleanup Backgrounds

body        { background-image: none; background-color: #FFFFFF;}
.portlet    { background-color: #FFFFFF; }

The MediaWiki Page is made up of three sections, these are the represented by styles .portlet, #content, and #footer. For the purposes of our first example, the content section and the footer section are already white.

==>

At this point I should recommend that you use FireFox for future work. You should then install the Web Developer Add-on. What results from this plugin is invaluable information about the internals of a web page. The two options most useful for this exercise is:

  • Information | Display Id and Class Details.
  • Information | Display Element Information (move the cursor around to get information)
==>

2. Cleanup Borders

I don’t feel that borders around things are warranted. I’m more a clean look kinda guy. Remove all borders, say one to separate the footer from the page.

#content    { border-width: 0px; }
.portlet .pBody
            { border-width: 0px; }
#footer     { border-top: 1px solid #888888; border-bottom-width: 0px; }
==>

Maybe, that’s a little too clean. Add some separators on left side options.

#p-navigation,
#p-search,
#p-tb       { border-top: 1px dotted #888888; }
==>

3. Links

Using the Forge Styles http://forge.mysql.com/css/shared.css we can adjust the links accordingly.

a:link      { color: #10688E; text-decoration: none; }
a:hover     { color: #003366; text-decoration: underline; }
a:visited   { color: #106887; text-decoration: none; }

a.new:link  { color: #AA0000; text-decoration: none; }
a.new:hover { color: #AA0000; text-decoration: underline; }

==>

4. Content

Ok, we have played around a little, now to move into some more serious changes. Looking at the general page look and feel that you see in most page content.


html,body,p,td,a,li
            { font: 12px/19px Verdana, "Lucida Grande", "Lucida Sans Unicode", Tahoma, Arial, sans-serif; }

h1          { font: bold 24px Helvetica, Arial, sans-serif; color: #EB694A; letter-spacing: -1px;
              margin: 1.5em 0 0.25em 0;
              border-bottom: 1px dotted #888888; line-height: 1.1em; padding-bottom: 0.2em; }

h2          { font: bold 18px Helvetica, Arial, sans-serif;  color: #EB694A; letter-spacing: -1px;
              margin: 2em 0 0 0;
              border-bottom: 1px dotted  #888888; line-height: 1.1em; padding-bottom: 0.2em; }

h3          { font-size: 12px; color: #6F90B5; }

h4          { font-size: 12px; color: #6F90B5; }

At this time, I’ve created an Example Page to better demonstration of the look and feel, as the default MediaWiki Main page has limited content.

Some Text Content at the start of the page.

= A First Level Heading =
The first paragraph text.
[[Main Page | Internal Link to existing Page]],
[[Nonexistent Page | Internal Link to nonexistent Page]],
[http://forge.mysql.com  External Link],
http://www.mysql.com  (plain text link)

== A Second Level Heading ==
The first paragraph text.

== A Second Level Heading ==
The first paragraph text.
* List Item Line 1
* List Item Line 2
* List Item Line 3
=== A Third Level Heading ===
The first paragraph text.
* List Item Line 1
** Sub Item 1
** Sub Item 1
** Sub Item 1
*** Sub Item 1
*** Sub Item 2
*** Sub Item 3
* List Item Line 2
* List Item Line 3
=== A Third Level Heading ===
The first paragraph text.
# List Item Line 1
## Sub Item 1
## Sub Item 1
## Sub Item 1
### Sub Item 1
### Sub Item 2
### Sub Item 3
# List Item Line 2
# List Item Line 3
== A Second Level Heading ==
The first paragraph text.

== A Second Level Heading ==
The first paragraph text.

==>

5. Table of Contents

By default, the Table of Contents shows at the top of a page when a given amount of sections or content is present (not sure what the trigger is). The issue is, for larger pages, it means you need to scroll down before you can see any page content. You can disable this with the __NOTOC__ option, but a better solution is to position the Table of Contents so as to not interfere with initial content.

#toc        { float: right; margin: 0 0 1em 1em; border: solid 1px #888888; text-color: #EFEFEF; color: #333333; }
#toc td     { padding: 0.5em; }
#toc .tocindent
            { margin-left: 1em; }
#toc .tocline
            { margin-bottom: 0px; }
#toc p      { margin: 0; }
#toc .editsection
            { margin-top: 0.7em;}
==>

Ok, we are about half way there.

6. Menu Options

In order to get a look and feel like the Forge Home Page., we now have to work on the rest of the navigation options at the top of the page above the content. Let’s start with Second Line of Menu Options (I’ll explain more later why).

#p-cactions { padding-right: 0px; margin-right: 0px; background-color: #006486; width: 100%; top: 77px; }
#p-cactions ul
            { margin: 0; padding: 0; list-style: none;  font-size: 85%; margin-left: 10px; }
#p-cactions li
            { float:left; margin:0; padding:0; text-indent:0; border-width: 0px; }
#p-cactions li a
            { display:block;  color:#F7F7F7;  font-weight: bold;
              background-color: #666666;  border:solid 1px #DDDDDD;
              border-width: 0px;  border-left-width:1px;  text-decoration:none; white-space:nowrap;}
#p-cactions li a:hover
            { background-color: #FFBC2F; color: #666666; }
#p-cactions li.selected a
            { background: #EA7F07;  border:none;}
#p-cactions li.selected  a:hover
            { color: #000000; }
==>

7. Top Menu Options

#p-personal .pBody
            { background: #FFFFFF url(http://forge.mysql.com/img/bggradient.jpg) no-repeat top right; }

#p-personal li a,
#p-personal li a.new
            { color: #FFFFFF; text-decoration: none; font-weight: bold; }
#p-personal li a:hover
            { color: #E97B00; background-color: transparent; text-decoration: underline; }
==>

The down side is it should be the same height at the Forge Page. This required a little more work, and other sections had to be adjusted accordingly, hence why I left this to last. (The size is based on the later mention logo height + margins)

#p-personal { height: 62px; }
#p-personal .pBody
            { height: 62px; }
#p-cactions { top: 62px; }
#content    { margin-top: 84px; }

==>

8. Some Miscellaneous Things

  • Deemphasis the text in the footer a little
  • Remove the annoying arrow that occurs after external links
  • Remove the logo from the login link
  • Hide the My Talk link
#footer     { color: #888888; }
#bodyContent a[href ^="http://"]
            { background:inherit !important; padding-right:inherit !important}
li#pt-userpage
            { background: none; }
li#pt-mytalk
            { display: none; }

==>

The Badging

Let us not forget the final step, the logo badging.

This requires a change to a MediaWiki filesystem file

LocalSettings.php.

$wgLogo             = "http://forge.mysql.com/img/mysqllogo.gif";

And the following Style changes.

#p-logo, #p-logo a, #p-logo a:hover
            { width:100px; height: 52px; }
#p-logo     { margin-left: 10px; margin-top: 5px; margin-bottom: 5px; }

#p-cactions { left: 0px; }
#p-cactions ul
            { margin-left: 180px; }

In order to overcome the top options bleeding to white text on white background, I’ve increased the right side fill of the default bggradient image, replacing the appropriate ULR with the following.

#p-personal .pBody
            { background: #FFFFFF url(/images/bggradient.png) no-repeat top right; }
==>

Conclusion

It’s not quite perfect yet, but this shows how it can be done. Some minor things are left, but I’ve run out of time for the few hours I allocated to this.
The end result of monobook.css for this lesson is:

/* edit this file to customize the monobook skin for the entire site */

/* Background Display */

body        { background-image: none; background-color: #FFFFFF;}
.portlet    { background-color: #FFFFFF; }

/* Borders */

#content    { border-width: 0px; }
.portlet .pBody
            { border-width: 0px; }
#footer     { border-top: 1px solid #888888; border-bottom-width: 0px; }

#p-navigation,
#p-search,
#p-tb       { border-top: 1px dotted #888888; }

/* Links */
a:link      { color: #10688E; text-decoration: none; }
a:hover     { color: #003366; text-decoration: underline; }
a:visited   { color: #106887; text-decoration: none; }

a.new:link  { color: #AA0000; text-decoration: none; }
a.new:hover { color: #AA0000; text-decoration: underline; }

a.external:link {color: #000000; text-decoration: none; }
a.external:hover { color: #AA0000; text-decoration: underline; }


/* Page Look & Feel */
html,body,p,td,a,li
            { font: 12px/19px Verdana, "Lucida Grande", "Lucida Sans Unicode", Tahoma, Arial, sans-serif; }

h1          { font: bold 24px Helvetica, Arial, sans-serif; color: #EB694A; letter-spacing: -1px;
              margin: 1.5em 0 0.25em 0;
              border-bottom: 1px dotted #888888; line-height: 1.1em; padding-bottom: 0.2em; }

h2          { font: bold 18px Helvetica, Arial, sans-serif;  color: #EB694A; letter-spacing: -1px;
              margin: 2em 0 0 0;
              border-bottom: 1px dotted  #888888; line-height: 1.1em; padding-bottom: 0.2em; }

h3          { font-size: 12px; color: #6F90B5; }

h4          { font-size: 12px; color: #6F90B5; }

/* Table of Contents */
#toc        { float: right; margin: 0 0 1em 1em; border: solid 1px #888888; #EFEFEF; color: #333333; }
#toc td     { padding: 0.5em; }
#toc .tocindent
            { margin-left: 1em; }
#toc .tocline
            { margin-bottom: 0px; }
#toc p      { margin: 0; }
#toc .editsection
            { margin-top: 0.7em;}


/* Second Line Top Menu Options */

#p-cactions { padding-right: 0px; margin-right: 0px; background-color: #006486; width: 100%; }

#p-cactions ul
            { margin: 0; padding: 0; list-style: none;  font-size: 85%; margin-left: 10px; }

#p-cactions li
            { float:left; margin:0; padding:0; text-indent:0; border-width: 0px; }

#p-cactions li a
            { display:block;  color:#F7F7F7;  font-weight: bold;
              background-color: #666666;  border:solid 1px #DDDDDD;
              border-width: 0px;  border-left-width:1px;  text-decoration:none; white-space:nowrap;}

#p-cactions li a:hover
            { background-color: #FFBC2F; color: #66666; }

#p-cactions li.selected a
            { background: #EA7F07;  border:none;}

#p-cactions li.selected  a:hover
            { color: #000000; }

/* Top Menu Options */
#p-personal .pBody
            { background: #FFFFFF url(/images/bggradient.png) no-repeat top right; }

#p-personal li a,
#p-personal li a.new
            { color: #FFFFFF; text-decoration: none; font-weight: bold; }
#p-personal li a:hover
            { color: #E97B00; background-color: transparent; text-decoration: underline; }

/* Top Menu Height Adjustments */
#p-personal { height: 62px; }
#p-personal .pBody
            { height: 62px; }
#p-cactions { top: 62px; }
#content    { margin-top: 84px; }

/* Minor Things */
#footer     { color: #888888; }
#bodyContent a[href ^="http://"]
            { background:inherit !important; padding-right:inherit !important}
li#pt-userpage
            { background: none; }
li#pt-mytalk
            { display: none; }

/* Badging */
#p-logo, #p-logo a, #p-logo a:hover
            { width:100px; height: 52px; }
#p-logo     { margin-left: 10px; margin-top: 5px; margin-bottom: 5px; }

#p-cactions { left: 0px; }
#p-cactions ul
            { margin-left: 180px; }

References

User Styles Alternative
Gallery of user styles
Skins
LocalSettings.php Style Information
Navigation Bar
User Rights
Wikipedia Monobook
MediaWiki Skin HowTo

The definition of a Unit Test

A Test is not a Unit Test if:

  • It talks to the database
  • It communicates across a network
  • It touches the filesystem
  • It can’t run the same time as any of your other unit tests
  • You have to do special things to your environment to run it (e.g. editing config files)