NoSQL options

The NoSQL event in New York had a number of presentations on non relational technologies including of Hadoop, MongoDB and CouchDB.

Coming historically from a relational background of 20 years with Ingres, Oracle and MySQL I have been moving my focus towards non relational data store. The most obvious and well used today is memcached, a non persistent distributed key/value pair store. There are a number of persistent key/value stores in the marketplace, Tokyo Cabinet, Project Voldemort and Redis to name a few.

My list of data store products helps to identify the complex name space of varying products that now exist. A trend is towards schema less solutions, the ability to better manage dynamically typed/formatted information and the Agile Methodology release approach is simply non achievable in a statically type relational database table/column structure. The impact of constant ALTER TABLE commands in a MySQL database makes your production system unusable.

In a highly distribute online and increasing offline operation, fault tolerance and data synchronization and eventual consistency are required features in complex topologies such as multi-master.

I advise and promote a technology agnostic solution when possible. With the use of an API this is actually achievable, however in order to use a variety of backend data store products, one must consider the design patterns for optimal management. Two factors to support a highly distributed data set are no joins and minimal transactional semantics. The Facebook API is a great example, where there are no joins for their MySQL Relational backend. The movement back to a logical and non-normalized schema, or move towards a totally schemaless solution do require great though in the architectural concepts of your application.

Ultimately feature requirements will dictate the relative strengths and weaknesses of products. Full text search is a good example. CouchDB provides native support via Lucene. Another feature I like of couchDB is its append only data mode. This makes durability easy, and auto-recovery after crash a non issue, another feature a transactional relational database can not achieve.

With a 2 day no:sql(east) conference this month, there is definitely greater interest in this space.

Problems compiling MySQL 5.4

Seem’s the year Sun had for improving MySQL, and with an entire new 5.4 branch the development team could not fix the autoconf and compile dependencies that has been in MySQL for all the years I’ve been compiling MySQL. Drizzle has got it right, thanks to the great work of Monty Taylor.

I’m working on the Wafflegrid AWS EC2 AMI’s for Matt Yonkovit and while compiling 5.1 was straight forward under Ubuntu 8.10 Intrepid, compiling 5.4 was more complicated.

For MySQL 5.1 I needed only to do the following:

apt-get install -y build-essential
apt-get install libncurses5-dev
./configure
make
make install

For MySQL 5.4, I elected to use the BUILD scripts (based on Wafflegrid recommendations). That didn’t go far before I needed.

apt-get install -y automake libtool

You then have to go compiling MySQL 5.4 for 10+ minutes to get an abstract error, then you need to consider what dependencies may be missing.
I don’t like to do a blanket apt-get of a long list of proposed packages unless I know they are actually needed.

The error was:

make[1]: Entering directory `/src/mysql-5.4.0-beta/sql'
make[1]: warning: -jN forced in submake: disabling jobserver mode.
/bin/bash ../ylwrap sql_yacc.yy y.tab.c sql_yacc.cc y.tab.h sql_yacc.h y.output sql_yacc.output -- -d --verbose
make -j 6 gen_lex_hash
make[2]: Entering directory `/src/mysql-5.4.0-beta/sql'
rm -f mini_client_errors.c
/bin/ln -s ../libmysql/errmsg.c mini_client_errors.c
make[2]: warning: -jN forced in submake: disabling jobserver mode.
rm -f pack.c
../ylwrap: line 111: -d: command not found
/bin/ln -s ../sql-common/pack.c pack.c
....
make[1]: Leaving directory `/src/mysql-5.4.0-beta/sql'
make: *** [all-recursive] Error 1

What a lovely error ../ylwrap: line 111: -d: command not found

ylwrap is part of yacc, and by default in this instance it’s not even an installed package. I’ve compiled MySQL long enough that it requires yacc, and actually bison but to you think it would hurt if the configure told the user this.

It’s also been some time since I’ve compiled MySQL source, rather focusing on Drizzle. I had forgotten just how many compile warnings MySQL throws. Granted a warning is not an error, but you should not just ignore them in building a quality product.

Drizzle now available on Mosso

Mosso the Rackspace Cloud now has a Drizzle developer image much like the first Drizzle AMI on EC2.

The Mosso interface is definitely different, it’s a GUI, and I definitely prefer CLI, but it’s a simpler navigation for a new user. I suspect an API may be available.

I had an issue with the backup process, more the lack of feedback. The Knowledge Base didn’t help, so both calling and Live Chat directed me ultimately to the same person. I also found a bug in the backup process, that is being able to select an incomplete backup to try and launch a new server. I talked to Support about and apparently already known.

And in true open source form, the Drizzle version is actually one point higher then yesterday’s AWS image.

I don’t know how to *publish* this backup so others can try it. Something on the list of things to do, however I was able to verify my backup with a new instance.

$ drizzle
Welcome to the Drizzle client..  Commands end with ; or g.
Your Drizzle connection id is 2
Server version: 2009.04.998 Source distribution

Type 'help;' or 'h' for help. Type 'c' to clear the buffer.

drizzle>
drizzle> select version();
+-------------+
| version()   |
+-------------+
| 2009.04.998 |
+-------------+
1 row in set (0 sec)

drizzle> select count(*) from sakila.film;
+----------+
| count(*) |
+----------+
|     1000 |
+----------+
1 row in set (0.18 sec)

Announcing Drizzle on EC2

I have published the very first sharable Drizzle Amazon Machine Image (AMI) for AWS EC2, based on the good feedback from my discussion at the Drizzle Developer Day on what options we should try.

This first version is a 32bit Developer instance, showcasing Drizzle and all necessary developer tools to build Drizzle from source.

What you will find on drizzle-ami/intrepid-dev32 – ami-b858bfd1

Ubuntu 8.10 Intrepid 32 bit base server installation:

  • build tools
  • drizzle dependencies
  • bzr 1.31.1

From the respective source trees the following software is available:

  • drizzle 2009.04.997
  • libdrizzle 0.0.2
  • gearman 0.0.4
  • memcached 1.2.8
  • libmemcached 0.28

Drizzle has been configured with necessary dependencies for PAM authentication, http_auth, libgearman and MD5 but these don’t seem to be available in the binary distribution.

I will be creating additional AMI’s including 64bit and LAMP ready binary only images.

The following example shows using drizzle on this AMI. Some further work is necessary for full automation, parameters and logging. I’ve raised a number of issues the Drizzle Developers are now hard at work on.

1. Starting Drizzle

ssh [email protected]
sudo /etc/init.d/drizzle-server.init start &

2. Testing Drizzle (the sakila database has been installed)

$ drizzle
Welcome to the Drizzle client..  Commands end with ; or g.
Your Drizzle connection id is 4
Server version: 2009.04.997 Source distribution

Type 'help;' or 'h' for help. Type 'c' to clear the buffer.

drizzle> select version();
+-------------+
| version()   |
+-------------+
| 2009.04.997 |
+-------------+
1 row in set (0 sec)

drizzle> select count(*) from sakila.film;
+----------+
| count(*) |
+----------+
|     1000 |
+----------+
1 row in set (0 sec)

3. Compiling Drizzle

sudo su - drizzle
ls
deploy  drizzle  libdrizzle  sakila-drizzle
cd drizzle
./configure --help
Description of plugins:

   === HTTP Authentication Plugin ===
  Plugin Name:      auth_http
  Description:      HTTP based authentications
  Supports build:   static and dynamic

   === PAM Authenication Plugin ===
  Plugin Name:      auth_pam
  Description:      PAM based authenication.
  Supports build:   dynamic

   === compression UDFs ===
  Plugin Name:      compression
  Description:      UDF Plugin for compression
  Supports build:   static and dynamic
  Status:           mandatory

   === crc32 UDF ===
  Plugin Name:      crc32
  Description:      UDF Plugin for crc32
  Supports build:   static and dynamic
  Status:           mandatory

   === Error Message Plugin ===
  Plugin Name:      errmsg_stderr
  Description:      Errmsg Plugin that sends messages to stderr.
  Supports build:   dynamic

   === Daemon Example Plugin ===
  Plugin Name:      hello_world
  Description:      UDF Plugin for Hello World.
  Supports build:   dynamic

   === Gearman Logging Plugin ===
  Plugin Name:      logging_gearman
  Description:      Logging Plugin that logs to Gearman.
  Supports build:   dynamic

   === Query Logging Plugin ===
  Plugin Name:      logging_query
  Description:      Logging Plugin that logs all queries.
  Supports build:   static and dynamic
  Status:           mandatory

   === Syslog Logging Plugin ===
  Plugin Name:      logging_syslog
  Description:      Logging Plugin that writes to syslog.
  Supports build:   static and dynamic
  Status:           mandatory

   === MD5 UDF ===
  Plugin Name:      md5
  Description:      UDF Plugin for MD5
  Supports build:   static and dynamic

   === One Thread Per Connection Scheduler ===
  Plugin Name:      multi_thread
  Description:      plugin for multi_thread
  Supports build:   static
  Status:           mandatory

   === Old libdrizzle Protocol ===
  Plugin Name:      oldlibdrizzle
  Description:      plugin for oldlibdrizzle
  Supports build:   static
  Status:           mandatory

   === Pool of Threads Scheduler ===
  Plugin Name:      pool_of_threads
  Description:      plugin for pool_of_threads
  Supports build:   static
  Status:           mandatory

   === Default Signal Handler ===
  Plugin Name:      signal_handler
  Description:      plugin for signal_handler
  Supports build:   static
  Status:           mandatory

   === Single Thread Scheduler ===
  Plugin Name:      single_thread
  Description:      plugin for single_thread
  Supports build:   static
  Status:           mandatory

   === Archive Storage Engine ===
  Plugin Name:      archive
  Description:      Archive Storage Engine
  Supports build:   static
  Status:           mandatory

   === Blackhole Storage Engine ===
  Plugin Name:      blackhole
  Description:      Basic Write-only Read-never tables
  Supports build:   static and dynamic
  Configurations:   max, max-no-ndb

   === CSV Storage Engine ===
  Plugin Name:      csv
  Description:      Stores tables in text CSV format
  Supports build:   static
  Status:           mandatory

   === Memory Storage Engine ===
  Plugin Name:      heap
  Description:      Volatile memory based tables
  Supports build:   static
  Status:           mandatory

   === InnoDB Storage Engine ===
  Plugin Name:      innobase
  Description:      Transactional Tables using InnoDB
  Supports build:   static and dynamic
  Configurations:   max, max-no-ndb
  Status:           mandatory

   === MyISAM Storage Engine ===
  Plugin Name:      myisam
  Description:      Traditional non-transactional MySQL tables
  Supports build:   static
  Status:           mandatory


Report bugs to <http://bugs.launchpad.net/drizzle>.

Setting up MySQL on Amazon Web Services (AWS) Presentation

On Tuesday at the MySQL Camp 2009 in Santa Clara I presented Setting up MySQL on Amazon Web Services (AWS).

This presentation assumed you know nothing about AWS, and have no account. With Internet access via a Browser and a valid Credit Card, you can have your own running Web Server on the Internet in under 10 minutes, just point and click.

We also step into some more detail online click and point and supplied command line tools to demonstrate some more advanced usage.

Your Code, Your Community, Your Cloud… Project Kenai

Following the opening keynote announcement about Kenai I ventured into a talk on Project Kenai.

With today’s economy, the drive is towards efficiency is certainly a key consideration, it was quoted that dedicated hosting servers only run at 30% efficiency.

An overview again of Cloud Computing

  • Economics – Pay as you go,
  • Developer Centric – rapid self provisioning, api-driven, faster deployment
  • Flexibility – standard services, elastic, on demand, multi-tenant

Types of Clouds

  • Public – pay as you go, multi-tenant application and services
  • Private – Cloud computing model run within a company’s own data center
  • Mixed – Mixed user of public and private clouds according to applications

SmugMug was referenced as a Mixed Cloud example.

Cloud Layers

  • Infrastructure as a Services – Basic storage and computer capabilities offer as a service (eg. AWS)
  • Platform as a Service – Developer platform with build-in services. e.g. Google App Engine
  • Software as Service – applications offered on demand over the network e.g salesforce.com

Some issues raised about this layers included.

  • IaaS issues include Service Level, Privacy, Security, Cost of Exit
  • PaaS interesting point, one that is the bane of MySQL performance tuning, that is instrumentation
  • SaaS nothing you need to download, you take the pieces you need, interact with the cloud. More services simply like doing your Tax online.

Sun offers Project Kenai as well as Zembly.

Project Kenai

  • A platform and ecosystem for developers.
  • Freely host open source projects and code.
  • Connect, community, collaborate and Code with peers
  • Eventually easily deploy application/services to “clouds”

Kenai Features

  • Code Repository with SVN, Mercurial, or an external repository
  • Issue tracking with bugzilla, jira
  • collaboration tools such as wiki, forums, mailing lists
  • document hosting
  • your profile
  • administrative role

Within Kenai you can open up to 5 open source projects and various metrics of the respositories, issue trackers, wiki etc.

The benefits were given as the features are integrated into your project, not distributed across different sites. Agile development within the project sees a release every 2 weeks. Integration with NetBeans and Eclipse is underway.

Kenai is targeted as being the core of the next generation of Sun’s collaboration tools. However when I asked for more details about uptake in Sun, it’s only a request, not a requirement for internal teams.

The API’s for the Sun Cloud are at http://kenai.com/projects/suncloudapis.

Event: CommunityOne East in New York, NY.
Presenter: Tori Wieldt, Sun Microsystems
Article Author: Ronald Bradford

Everybody is talking About Clouds

From the opening keynote at CommunityOne East we begin with Everybody is talking About Clouds.

It’s difficult to get a good definition, the opening cloud definition today was Software/Platform/Storage/Database/Infrastructure as a service. Grid Computing, Visualization, Utility Computing, Application Hosting. Basically all the buzz words we currently know.

Cloud computing has the ideals of truly bringing a freedom of choice. For inside or outside of an enterprise, the lower the barrier, time and cost into freedom of choice give opportunities including:

  • Self-service provisioning
  • Scale up, Scale down.
  • Pay for only what you use.

Sun’s Vision has existed since 1984 with “The NETWORK is the Computer”.

Today, Sun’s View includes Many Clouds, Public and Private, Tuned up for different application needs, geographical, political, with a goal of being Open and Compatible.

How do we think into the future for developing and deploying into the cloud? The answer given today was, The Sun Open Cloud Platform which includes the set of core technologies, API’s and protocols that Sun hopes to see uptake among many different providers.

The Sun Cloud Platform

  • Products and Technologies – VirtualBox, Sun xVM, Q-Laser, MySQL
  • Expertise and Services
  • Partners – Zmanda, Rightscale, Kickapps
  • Open Communities – Glashfish, Java, Open Office, Zfs, Netbeans, Eucalyptus

The Sun Cloud includes:

  • Compute Service
  • Storage Service
  • Virtual Data Center
  • Open API – Public, RESTful, Java, Python, Ruby

The public API has been released today and is available under Kenai. It includes two key points:

  • Everything is a resource http GET, POST, PUT etc
  • A single starting point, other URI’s are discoverable.

What was initially showed was CLI interface exmaples, great to see this still is common, a demonstration using drag and drop via a web interface was also given, showing a load balanced, multi-teired, multi server environment. This was started and tested during the presentation.

Then Using Cyberduck (a WebDAV client on Mac OS/X) and being able to access the storage component at storage.network.com directly, then from Open Office you now get options to Get/Save to Cloud ( using TwoGuys.com, Virtual Data Center example document).

Seamless integration between the tools, and the service. That was impressive.

More information at sun.com/cloud. You can get more details also at the Sun Microsystems Unveils Open Cloud PlatformOfficial Press Release.

Event: CommunityOne East in New York, NY.
Article Author: Ronald Bradford

Extending application data to the cloud

I was one of the invited panel speakers to A panel on Cloud Computing this week in New York. As one of 2 non vendor presenters, it was a great experience to be invited and be involved with vendors.

While I never got to use my slides available here, I did get to both present certain content, and indeed questions and discussions on the night were on other points of my content.

Cloud computing is here, it’s early days and new players will continue to emerge. For example, from the panel there was AppNexus, reviewed favorably at Info World in comparison with EC2 and Google App Engine, 10gen, an open source stack solution and Kaavo which from an initial 60 seconds of playing provide a management service on top of AWS similar to what ElasticFox provides. I need to investigate further how much the feature set extends and would compete with others like RightScale for example.

The greatest mystery came from Hank Williams and his stealth Kloudshare. He did elaborate more on where they aim to provide services. A new term discussed was “Tools as a service”, akin to moving use metaphorically from writing in Assembly language to the advanced frameworks of today’s generation of languages such as Java and Ruby.

Thanks to Murat Aktihanoglu of Unype who chaired the event.

Your data and the cloud

I will be speaking on July 29th in New York at an Entrepreneurs Forum on A Free Panel on Cloud Computing. With a number of experts including Hank Williams of KloudShare, Mike Nolet of AppNexus, and Hans Zaunere of New York PHP fame is should be a great event.

The focus of my presentation will be on “Extending existing applications to leverage the cloud” where I will be discussing both the advantages of the cloud, and the complexities and issues that you will encounter such as data management, data consistency, loss of control, security and latency for example.

Using traditional MySQL based applications I’ll be providing an approach that can lead to your application gaining greater power of cloud computing.


About the Author

Ronald Bradford provides Consulting and Advisory Services in Data Architecture, Performance and Scalability for MySQL Solutions. An IT industry professional for two decades with extensive database experience in MySQL, Oracle and Ingres his expertise covers data architecture, software development, migration, performance analysis and production system implementations. His knowledge from 10 years of consulting across many industry sectors, technologies and countries has provided unique insight into being able to provide solutions to problems. For more information Contact Ronald.

Setting up on EC2

Thanks to my friend Dustin, and his EC2 demo using Elasticfox Firefox Extension for Amazon EC2 I got an EC2 image setup. With other references Link 1,Link 2,Link 3 I was also able to create my own AMI.

Some notes specific for my configuration.

Pre-config ElasticFox key for launching directly from ElasticFox SSH connections.

mkdir ~/ec2-keys
mv ~/Downloads/elasticfox.pem ~/ec2-keys/id_elasticfox
chmod 600 ~/ec2-keys/id_elasticfox
chmod 700 ~/ec2-keys/
ssh -i /Users/rbradfor/ec2-keys/id_elasticfox [email protected]

Installed Software.

apt-get update
apt-get -y autoremove
apt-get -y install apache2
apt-get -y install mysql-server
# Prompts for password (very annoying)
apt-get -y install php5
apache2ctl graceful
echo "Hello World" > /var/www/index.html
echo "< ? phpinfo() ?>" > /var/www/phpinfo.php

Configuration to save AMI.

scp -i ~/ec2-keys/id_elasticfox ~/ec2-keys/id_elasticfox pk-CHK7DP4475BWUKIUF4WFDIW3VMYDYOHQ.pem cert-CHK7DP4475BWUKIUF4WFDIW3VMYDYOHQ.pem [email protected]:/mnt
ec2-bundle-vol -d /mnt -c cert-CHK7DP4475BWUKIUF4WFDIW3VMYDYOHQ.pem -k pk-CHK7DP4475BWUKIUF4WFDIW3VMYDYOHQ.pem -u AccountNumber -r i386 -p ubuntu804_lamp
ec2-upload-bundle -b rbradford_804_lamp_ami -m /mnt/ubuntu804_lamp.manifest.xml -a AccessID -s SecretKey

Working with Google App Engine

Yesterday I took a more serious look at Google App Engine, I got a developer account some weeks ago.

After going though the getting started demo some time ago, I chose an idea for a FaceBook Application and started in true eXtreme Programming (XP) style (i.e. What’s the bare minimum required for first iteration). I taught myself some Python and within just a few minutes had some working data being randomly generated totally within the development SDK environment On my MacBook. I was not able to deploy initially via the big blue deploy button, the catch is you have to register the application manually online.

Then it all worked, and hey presto I’ve got my application up at provided domain hosting at appspot.com

Having coming from a truly relational environment, most notably MySQL of recent years I found the Datastore API different in a number of ways.

  • There is no means of Sequences/Auto Increment. There is an internal Unique Key, but it’s a String, not an integer, not enabling me to re-use it.
  • The ListProperty enables the use of Lists in Python (like Arrays) to be easily stored.
  • The ReferenceProperty is used as a foreign key relationship, and then can be more reference within an object hierarchy
  • I really missed an interactive interface. You have no abililty to look at your data, specifically for me I wanted to seek some data, then I wanted to delete some data, but I had to do all this via code.

Having developed a skelaton FaceBook application before in PHP, I figured a Python version would not be that much more work, but here is where I good stumped Information at Hosting a Facebook Application on Google AppEngine leveraging the PyFacebook project didn’t enable me to integrate Google App Engine with FaceBook just yet.

This had me thinking I need to resort to a standalone simply Python Facebook application to confirm the PyFacebook usage. Now my problems started. Under Mac it’s a lot more complex to install and configure Python/Django etc then under Linux. I tried to do it on my dedicated server, but drat Python is at 2.3.4, and it seems 2.5.x is needed.

Still it was a valuable exercise, I dropped the FaceBook goal and just worked on more Google App Engine stuff. Still early days, but it was productive to try out this new technology.

What I need to work on now is how to hold state within Python infrastructure so I can manage a user login and storing and retrieving user data for my sample app.