Giving control of your data to the cloud

I’ve been doing some research and evaluation of more cloud computing. Specifically my focus has been on data store, and considering how to augment an existing operation using a popular database such as MySQL .

I’ve been looking first at Google App Engine and now I have my SimpleDB Beta today will be looking here next.

Some observations I’ve struggled with are:

  • No Native CLI, say for basic data setup. You can do some programmatic input for SELECT statements in a Query object in a SQL like syntax called GQL, but you can’t do DML
  • No simple data viewer. In production you would not do this, but I’m in evaluation and still looking at functionality, verification of results etc. A phpmyadmin clone is what I’m seeking for example. I suspect this would have been a good Google Summer of Code project.
  • Python only. While this was great for me to need to spend a 1/2 day to learn about Python syntax, it was just another small starting hurdle. If your organization doesn’t do or use Python, this is another skill or resource needed.

But the biggest concern and hurdle I’m understanding from my traditional principles is loss of control. Loss of control for monitoring and instrumentation, performance, availability and backup and recovery.

Recent issues in performance and unavailability have highlighted that App Engine is not suitable for mission critical web sites, not as your primary focus. I see huge benefits in augmenting access to information, perhaps more historical for example, a well define API within your application could easily support options to consider cloud storage as a secondary storage and primary retrieval of less important data.

My focus when revisiting here will be looking at means of object translate between tables and the Data Store and maybe an API for data transfer etc.

I suspect that when I get to evaluating EC2/S3 more I will have much more support by being able to leverage existing tools and techniques.

Tagged with: Databases MySQL

Related Posts

More CPUs or Newer CPUs

In a CPU-bound database workload, regardless of price, would you scale-up or scale-new? What if price was the driving factor, would you scale-up or scale-new? I am using as a baseline the first available AWS Graviton2 processor for RDS (r6g).

Read more

An Interesting Artifact with AWS RDS Aurora Storage

As part of using public datasets with my own Benchmarking Suite I wanted upsize a dataset for larger volume testing. I have always used the INFORMATION_SCHEMA.TABLES data_length and index_length columns as a sufficiently accurate measurement for actual disk space used.

Read more

How long does it take the ReadySet cache to warm up?

During my setup of benchmarking I run a quick test-sysbench script to ensure my configuration is right before running an hour+ duration test. When pointing to a Readyset cache where I have cached the 5 queries used in the sysbench test, but I have not run any execution of the SQL, throughput went up 10x in 5 seconds.

Read more