Your data and the cloud

I will be speaking on July 29th in New York at an Entrepreneurs Forum on A Free Panel on Cloud Computing. With a number of experts including Hank Williams of KloudShare, Mike Nolet of AppNexus, and Hans Zaunere of New York PHP fame is should be a great event.

The focus of my presentation will be on “Extending existing applications to leverage the cloud” where I will be discussing both the advantages of the cloud, and the complexities and issues that you will encounter such as data management, data consistency, loss of control, security and latency for example.

Using traditional MySQL based applications I’ll be providing an approach that can lead to your application gaining greater power of cloud computing.

About the Author

Ronald Bradford provides Consulting and Advisory Services in Data Architecture, Performance and Scalability for MySQL Solutions. An IT industry professional for two decades with extensive database experience in MySQL, Oracle and Ingres his expertise covers data architecture, software development, migration, performance analysis and production system implementations. His knowledge from 10 years of consulting across many industry sectors, technologies and countries has provided unique insight into being able to provide solutions to problems. For more information Contact Ronald.

Getting Started with Simple DB

With my continued investigation of evaluating alternative data management with cloud computing options, I’m now evaluating Amazon Simple DB. Still in restricted beta, it helps to have a friend on the inside.

Working through the Getting Started Guide (API Version 2007-11-07) was ok, annoying in parts. Here are some issues I found. I was working with Java as the programming language.

  • The Docs enable you to view the language syntax in Java, C#, Perl, PHP, VB.NET, ScratchPad. You can also restrict the view to a specific language. A rather cool feature. One observation is there is no Python, which is rather ironic as my first investigation was Google App Engine (GAE), and the only language here is Python. Something I had to learn first.
  • Preparing the Samples asks you download the Amazon SimpleDB Sample code, but this is not actually a link to the sample code but an index to Community Code. I used Java Library for Amazon SimpleDB which wasn’t even on the first page of results.
  • The supplied docs for specifying the Classpath was rather wrong, helps to simply find all .jar files and included these. Mine looks like:
    • #!/bin/sh
  • All examples in the docs then refer to making changes such as “invokeCreateDomain(service, action); line and add the following lines after // @TODO: set action parameters here:”, problem is all the samples don’t have the action variable, but rather a variable called request. The comment in the code ” // @TODO: set request parameters here” is at least accurate.
  • The docs contain a lot of Java syntax that is would not for example compile correctly. Plenty of occurances of a missing semicolon ‘;’
  • Each example defines
    String accessKeyId = ““;
    String secretAccessKey = “
    Ok for the first example, but as soon as I moved to the second, I re-factored these into a Interface called Constants.
  • In all the examples, they never provide any sample output, this would help just to confirm stuff. In Java Library for Amazon SimpleDB download link, there is an example output, but it’s outdated, with new data attribute called BoxUsage. My output is:
    • CreateDomain Action Response
  • And now some specifics. In a Relational database such as MySQL, you have Instance/Schema/Table/Column. Within SimpleDB it would indicate that you need a separate AWS account for Instance management. That’s probably a good thing as it will enable tracking of costs. There appears to be no concept of a Schema. Data is stored in Domains, this is the equivalent to a Table. Within each Domain, you specific Attributes, a correlation with Columns. One key difference is the ability to define a set of Attributes with the same identity (much like a list that is supported via Python/GAE). For any row of data, you must specify an itemName, this being equivalent to a Primary Key.These names table me back to old days (20 years ago) of Logical Data Models that used entities,attributes and relationships.
  • The term Replace is used when updating data for a given row.
  • When retrieving data, you first return a list of itemNames, then you can via Attributes for that given item.
  • You can perform a simple where qualification using a Query Expression, including against multiple Attributes via intersection syntax
  • A observation that is of significant concern is the lack of security against any type of operation. The Getting Started guide ends with Deleting the Domain. Is there no means to define permissions against type of users, such as an Application User, and a Database Administrator for managing the objects.

Well it took me longer to write this post, then to run through the example, but at least on a lazy Sunday afternoon, a first look at SimpleDB was quite simple.

I did also run into an error initially. I first started just via CLI under Linux (CentOS 5), but switched back to installing Eclipse on Mac OS/X for better error management, and of course this error didn’t occur.