NoSQL options

The NoSQL event in New York had a number of presentations on non relational technologies including of Hadoop, MongoDB and CouchDB.

Coming historically from a relational background of 20 years with Ingres, Oracle and MySQL I have been moving my focus towards non relational data store. The most obvious and well used today is memcached, a non persistent distributed key/value pair store. There are a number of persistent key/value stores in the marketplace, Tokyo Cabinet, Project Voldemort and Redis to name a few.

My list of data store products helps to identify the complex name space of varying products that now exist. A trend is towards schema less solutions, the ability to better manage dynamically typed/formatted information and the Agile Methodology release approach is simply non achievable in a statically type relational database table/column structure. The impact of constant ALTER TABLE commands in a MySQL database makes your production system unusable.

In a highly distribute online and increasing offline operation, fault tolerance and data synchronization and eventual consistency are required features in complex topologies such as multi-master.

I advise and promote a technology agnostic solution when possible. With the use of an API this is actually achievable, however in order to use a variety of backend data store products, one must consider the design patterns for optimal management. Two factors to support a highly distributed data set are no joins and minimal transactional semantics. The Facebook API is a great example, where there are no joins for their MySQL Relational backend. The movement back to a logical and non-normalized schema, or move towards a totally schemaless solution do require great though in the architectural concepts of your application.

Ultimately feature requirements will dictate the relative strengths and weaknesses of products. Full text search is a good example. CouchDB provides native support via Lucene. Another feature I like of couchDB is its append only data mode. This makes durability easy, and auto-recovery after crash a non issue, another feature a transactional relational database can not achieve.

With a 2 day no:sql(east) conference this month, there is definitely greater interest in this space.

Comments

  1. says

    The NoSQL event was excellent and while this post is a great start one thing I notice is you don’t give a succinct idea of the reasons database developers are moving toward non-relational data stores, such as the obvious scaling issues of performance and the ability to handle ridiculously large data sets (as well as scale elastically) but also non-obvious cases where you can realise significant administrative advantages with a non-relational store that replaces a necessarily more complicated relational cluster…

    Perhaps you could link to a more detailed analysis of the specific cases where non-relational is the way to go!

    cheers,

    Forest Mars

  2. Jonas says

    Hi,

    Just interested, but why does MySQL Cluster rarely make lists as above?
    Ndb is basically a distributed fault-tolerant persistent hash-table, with SQL on top.
    And we’re to my knowledge equally fast or faster than most if using native api and
    reasonably fast if using SQL.

    /Jonas

  3. Monty says

    I agree with Jonas. With non-sql interfaces becoming all the rage, why not write things using Ndb API?

  4. ronald says

    Jonas, in response to your question. Here are few points.

    1. People are looking at non SQL solutions, this article is about
    stuff that’s not SQL. They are wanting to leverage more agnostics
    means of accessing data. JSON is for example language agnostic, it’s a
    specification. SQL doesn’t fall into that category because everybody
    implemented things differently.

    2. MySQL Cluster has historically had 2 huge limitations. In Memory
    only, and you can grow it without complete downtime. Granted newer
    version have added disk based data, and now dynamic add node, but
    these are new feature and perhaps they need greater advertisement.

    3. Database are poor for support dynamically changing data, e.g. data
    sources with different columns for example, or changes regularly.
    People are not releasing software frequently, even daily. I’ve seen
    systems where it takes 5 days to modify a table, and it’s also not
    accessible at the time.

    Overall SQL is a limitation to the types of data people are now
    wanting to collect and use. SQL and relational databases have their
    inherit uses and benefits, but it’s not longer a one product to be
    munged into a solution to a problem, it’s find the best component for
    the problem at hand.

  5. Jonas says

    Ronald, in response to your response

    1) Re “non SQL” Our internal api (ndbapi, which is used by several of our customers)
    is very much non-SQL.
    It’s a key-row api. Individual rows can be accessed using keys.
    And we support scanning for rows (using several different algorithms)

    3) Re “dynamically changing data”
    – serializing any structure in a blob-column should not induce any overhead
    – we do support some *online* alter tables (e.g add column), i.e where data is *always*
    accessible (and updateble)

Trackbacks