NoSQL options

The NoSQL event in New York had a number of presentations on non relational technologies including of Hadoop, MongoDB and CouchDB.

Coming historically from a relational background of 20 years with Ingres, Oracle and MySQL I have been moving my focus towards non relational data store. The most obvious and well used today is memcached, a non persistent distributed key/value pair store. There are a number of persistent key/value stores in the marketplace, Tokyo Cabinet, Project Voldemort and Redis to name a few.

My list of data store products helps to identify the complex name space of varying products that now exist. A trend is towards schema less solutions, the ability to better manage dynamically typed/formatted information and the Agile Methodology release approach is simply non achievable in a statically type relational database table/column structure. The impact of constant ALTER TABLE commands in a MySQL database makes your production system unusable.

In a highly distribute online and increasing offline operation, fault tolerance and data synchronization and eventual consistency are required features in complex topologies such as multi-master.

I advise and promote a technology agnostic solution when possible. With the use of an API this is actually achievable, however in order to use a variety of backend data store products, one must consider the design patterns for optimal management. Two factors to support a highly distributed data set are no joins and minimal transactional semantics. The Facebook API is a great example, where there are no joins for their MySQL Relational backend. The movement back to a logical and non-normalized schema, or move towards a totally schemaless solution do require great though in the architectural concepts of your application.

Ultimately feature requirements will dictate the relative strengths and weaknesses of products. Full text search is a good example. CouchDB provides native support via Lucene. Another feature I like of couchDB is its append only data mode. This makes durability easy, and auto-recovery after crash a non issue, another feature a transactional relational database can not achieve.

With a 2 day no:sql(east) conference this month, there is definitely greater interest in this space.

multi-threaded memcached

I discovered while compiling Wafflegrid today that by default, the Ubuntu binaries for memcached are not-multithreaded.

Following the installation of memcached from apt-get and libmemcached I ran memslap for:

$ memslap -s localhost
    Threads connecting to servers 1
    Took 1.633 seconds to load data

$ memstat -s localhost
Listing 1 Server

Server: localhost (11211)
     pid: 23868
     uptime: 54
     time: 1244575816
     version: 1.2.2
     pointer_size: 32
     rusage_user: 0.90000
     rusage_system: 0.120000
     curr_items: 10000
     total_items: 10000
     bytes: 5430000
     curr_connections: 1
     total_connections: 3
     connection_structures: 2
     cmd_get: 0
     cmd_set: 10000
     get_hits: 0
     get_misses: 0
     evictions: 0
     bytes_read: 5430000
     bytes_written: 5430000
     limit_maxbytes: 0
     threads: 1

By installed the Latest RC 1.4.0 we see.

memslap -s localhost
    Threads connecting to servers 1
    Took 0.866 seconds to load data

memstat -s localhost

Listing 1 Server

Server: localhost (11211)
     pid: 8651
     uptime: 375
     time: 1244577237
     version: 1.4.0-rc1
     pointer_size: 32
     rusage_user: 0.110000
     rusage_system: 0.130000
     curr_items: 10000
     total_items: 10000
     bytes: 5510000
     curr_connections: 5
     total_connections: 8
     connection_structures: 6
     cmd_get: 0
     cmd_set: 10000
     get_hits: 0
     get_misses: 0
     evictions: 0
     bytes_read: 5510000
     bytes_written: 5510000
     limit_maxbytes: 0
     threads: 5

Thanks Matt for pointing that one out.