A Cassandra twitter clone

Following my successful Cassandra Cluster setup and having a potential client example to work with running Ruby On Rails (RoR), I came across the following examples in Ruby.

Not being a ruby developer, I thought it was time to investigate further. Starting first on Mac OS X 10.5, I found the first line example of installing cassandra via gem unsuccessful.

$ gem install cassandra
Updating metadata for 1 gems from http://gems.rubyforge.org
.
complete
ERROR:  could not find cassandra locally or in a repository

Some more reading highlights Otherwise, you need to install Java 1.6, Git 1.6, Ruby, and Rubygems in some reasonable way.

In case you didn’t read my earlier posts, Java 6 is installed, but not the default.

export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home
export PATH=$JAVA_HOME/bin:$PATH

I achieved installing RubyGems via Installing Ruby on Rails on Mac OS X.

$ sudo gem install rubygems-update

Updating metadata for 1 gems from http://gems.rubyforge.org
.
complete
Successfully installed rubygems-update-1.3.6
1 gem installed
Installing ri documentation for rubygems-update-1.3.6...
Installing RDoc documentation for rubygems-update-1.3.6...
Could not find main page README
Could not find main page README
Could not find main page README
Could not find main page README

$ sudo update_rubygems
RubyGems 1.3.6 installed

=== 1.3.6 / 2010-02-17

NOTE:

http://rubygems.org is now the default source for downloading gems.

You may have sources set via ~/.gemrc, so you should replace
http://gems.rubyforge.org with http://rubygems.org

http://gems.rubyforge.org will continue to work for the forseeable future.

New features:

* `gem` commands
  * Added `gem push` and `gem owner` for interacting with modern/Gemcutter
    sources
  * `gem dep` now supports --prerelease.
  * `gem fetch` now supports --prerelease.
  * `gem server` now supports --bind.  Patch #27357 by Bruno Michel.
  * `gem rdoc` no longer overwrites built documentation.  Use --overwrite
    force rebuilding.  Patch #25982 by Akinori MUSHA.
* Captial letters are now allowed in prerelease versions.

Bug fixes:

* Development deps are no longer added to rubygems-update gem so older
  versions can update sucessfully.
* Installer bugs:
  * Prerelease gems can now depend on non-prerelease gems.
  * Development dependencies are ignored unless explicitly needed.  Bug #27608
    by Roger Pack.
* `gem` commands
  * `gem which` now fails if no paths were found.  Adapted patch #27681 by
    Caio Chassot.
  * `gem server` no longer has invalid markup.  Bug #27045 by Eric Young.
  * `gem list` and friends show both prerelease and regular gems when
    --prerelease --all is given
* Gem::Format no longer crashes on empty files.  Bug #27292 by Ian Ragsdale.
* Gem::GemPathSearcher handles nil require_paths. Patch #27334 by Roger Pack.
* Gem::RemoteFetcher no longer copies the file if it is where we want it.
  Patch #27409 by Jakub Šťastný.

Deprecation Notices:

* lib/rubygems/timer.rb has been removed.
* Gem::Dependency#version_requirements is deprecated and will be removed on or
  after August 2010.
* Bulk index update is no longer supported.
* Gem::manage_gems was removed in 1.3.3.
* Time::today was removed in 1.3.3.


------------------------------------------------------------------------------

RubyGems installed the following executables:
	/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/gem

NOTE: This second command took over 60 seconds with no user feedback.

I was then able to successfully install cassandra via ruby’s gem package manager.

$ sudo gem install cassandra
Building native extensions.  This could take a while...
Successfully installed thrift-0.2.0
Successfully installed thrift_client-0.4.0
Successfully installed simple_uuid-0.1.0
Successfully installed cassandra-0.7.5
4 gems installed
Installing ri documentation for thrift-0.2.0...

Enclosing class/module 'thrift_module' for class BinaryProtocolAccelerated not known

Enclosing class/module 'thrift_module' for class BinaryProtocolAccelerated not known
Installing ri documentation for thrift_client-0.4.0...
Installing ri documentation for simple_uuid-0.1.0...
Installing ri documentation for cassandra-0.7.5...
Installing RDoc documentation for thrift-0.2.0...

Enclosing class/module 'thrift_module' for class BinaryProtocolAccelerated not known

Enclosing class/module 'thrift_module' for class BinaryProtocolAccelerated not known
Installing RDoc documentation for thrift_client-0.4.0...
Installing RDoc documentation for simple_uuid-0.1.0...
Installing RDoc documentation for cassandra-0.7.5...

My use of cassandra_helper provided the following expected dependency error.

$ cassandra_helper cassandra
Set the CASSANDRA_INCLUDE environment variable to use a non-default cassandra.in.sh and friends.
(in /Library/Ruby/Gems/1.8/gems/cassandra-0.7.5)
You need to install git 1.6 or 1.7

I found instructions to install git at Installing git (OSX) and installed via GUI installer.

I had to include to my current session path to get my Ruby Cassandra installation.

$ export PATH=/usr/local/git/bin:$PATH

$ cassandra_helper cassandra
Set the CASSANDRA_INCLUDE environment variable to use a non-default cassandra.in.sh and friends.
(in /Library/Ruby/Gems/1.8/gems/cassandra-0.7.5)
Checking Cassandra out from git
Initialized empty Git repository in /Users/rbradfor/cassandra/server/.git/
remote: Counting objects: 16715, done.
remote: Compressing objects: 100% (2707/2707), done.
remote: Total 16715 (delta 9946), reused 16011 (delta 9364)
Receiving objects: 100% (16715/16715), 19.22 MiB | 1.15 MiB/s, done.
Resolving deltas: 100% (9946/9946), done.
Updating Cassandra.
Buildfile: build.xml

clean:

BUILD SUCCESSFUL
Total time: 2 seconds
HEAD is now at 298a0e6 check-in debian packaging
Building Cassandra
Buildfile: build.xml

build-subprojects:

init:
    [mkdir] Created dir: /Users/rbradfor/cassandra/server/build/classes
    [mkdir] Created dir: /Users/rbradfor/cassandra/server/build/test/classes
    [mkdir] Created dir: /Users/rbradfor/cassandra/server/src/gen-java

check-gen-cli-grammar:

gen-cli-grammar:
     [echo] Building Grammar /Users/rbradfor/cassandra/server/src/java/org/apache/cassandra/cli/Cli.g  ....

build-project:
     [echo] apache-cassandra-incubating: /Users/rbradfor/cassandra/server/build.xml
    [javac] Compiling 247 source files to /Users/rbradfor/cassandra/server/build/classes
    [javac] Note: Some input files use or override a deprecated API.
    [javac] Note: Recompile with -Xlint:deprecation for details.
    [javac] Note: Some input files use unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.

build:

BUILD SUCCESSFUL
Total time: 42 seconds
CASSANDRA_HOME: /Users/rbradfor/cassandra/server
CASSANDRA_CONF: /Library/Ruby/Gems/1.8/gems/cassandra-0.7.5/conf
Listening for transport dt_socket at address: 8888
DEBUG - Loading settings from /Library/Ruby/Gems/1.8/gems/cassandra-0.7.5/conf/storage-conf.xml
....

I was then able to complete the example at up and running with cassandra running via the ruby interactive console.

I was also able to fire up the cassandra-cli and see the data added in ruby.

$ bin/cassandra-cli -host localhost
Connected to localhost/9160
cassandra> get Twitter.Statuses['1']
=> (column=user_id, value=5, timestamp=1267072406503471)
=> (column=text, value=Nom nom nom nom nom., timestamp=1267072406503471)
Returned 2 results.
cassandra> get Twitter.UserRelationships['5'];
=> (super_column=user_timeline,
     (column=???!??zvZ+?!, value=1, timestamp=1267072426991872)
     (column=??-?!???C?th?, value=2, timestamp=1267072427019091))
Returned 1 results.

No sure about the data in the second example.

Getting started with Cassandra

With the motivation from today’s public news on Twitter’s move from MySQL to Cassandra, my own skills desire following in-depth discussions at last November’s Open SQL Camp to consider Cassandra and yesterday’s discussion with a new client on persistent key-value store products, today I download installed and configured for the first time. Not that today’s news was unexpected, if you follow the Twitter Engineering Open Source projects you would have seen Cassandra as well as other products being used or evaluated by Twitter.

So I went from nothing to a working Cassandra node in under 5 minutes. This is what I did.

  1. While I knew this was an Apache project, a Google Search yields for me the 3rd link for the The Apache Cassandra Project at http://incubator.apache.org/cassandra/. Congrats for Cassandra now a top level Apache Project. This url will update soon.
  2. Download Cassandra. Hard to miss with a big green button on home page. Current version is 0.5
  3. I read Getting Started, which is the 3rd top level link on menu after Home and Download. Step 1 is picking a version which I’ve already done, Step 2 is Running a single node.
  4. The Getting Started indicated a problem on Mac OS X for the required minimum Java version. I was installing on Mac OS X 10.5 and CentOS 5.4. I’ve experienced this Java 6 default path issue before. Set my JAVA_HOME and PATH accordingly (after I updated the wiki with correct value)
  5. I extracted the tar file, changed to the directory and took at look at the README.txt file. Yes, I always check this first with any software and relevant because it includes valuable instructions on creating the default data and log directories.
  6. Start with bin/cassandra -f. No problems!
  7. I then followed the instructions from the link in Step 2 with the CassandraCli. This tests and confirms the installation is operational.

Ok, a working environment. I’ve now installed on a second machine and tested however I now need to configure the cluster, and the documentation is not as straightforward. Time to try out Google again.

On a side note, this is one reason why I love Open Source. I followed the instructions online and found a mistake in the Mac OS X path, I simply registered and corrected providing the benefit of my experience for the next reader(s).

You may also like to view future posts including.

O'Reilly Twitter Boot Camp a success

The first O’Reilly Twitter Boot Camp#OTBC was held in New York as a pre cursor to 140 Characters Conference#140conf on Monday 15th June, 2009.

With opening and closing keynotes were like matching bookends of The Twitter Book #twitterbook offered to all attendees and authored by the keynoters @timoreilly and @SarahM.

Attendees came from across the country. Just a few I spoke with coming from LA – @EricMueller of @FLWbooks, Texas – @marlaerwin , Vancouver – HootSuite, Las Vegas -zappos, Boston – @mvolpe , Philadelphia, @SBrownCCI from Cincinnati and @sticky_mommy from Vermont.

The demographics of attendees was a little different from my usual O’Reilly conferences of MySQL, OSCON and Web 2.0. There were less the half the attendees with laptops at hand for notes & twittering, offset by the high blackberry or should I say shaq-berry users (Thanks Ami @digitalroyalty), easily seen from the back of the steep and dark auditorium. A greater proportion of different industries and gender lead to many questions and discussions from users, not just technologists.

The morning panel sessions afforded no question time due to speakers providing good but overtime content. Over lunch Mike Volpe of HubSpot a corporate sponsor for the day set the standard by asking his panel of speakers to stick on time. This afforded almost 30 minutes of question time and a roar of approval from the crowd.

There is a lot of valuable information you can find by Twitter Search of #OTBC. A few examples include:

  • @archivesnext: Good advice: RT @mpedson RT @timoreilly: Twitter usage policy from @zappos at #OTBC: “Just be real and use your best judgement.”
  • @GeekGirlCamp: Hmmmm. Lots of conflicting views on following on Twitter here. What makes YOU follow someone? Would love to know… #OTBC
  • @CarriBugbee: ROI is a tricky thing on twitter; if you’re using it solely to generate revenue, you might be there for the wrong reason – @wholefoods #otbc
  • @mvolpe: “Driving ROI on Twitter” slides and video of my presentation later today for #OTBC – http://tinyurl.com/061509mvolpe
  • @ronaldbradford: Best Practices for Twitter – Build a commercial-grade profile. @CarriBugbee at #OTBC
  • @journalismgal: Ask questions within your tweets even something as simple as your fab apple #otbc
  • @ronaldbradford: Do stay tuned in. Nights, weekends, holidays are all twitter time. Maria Erwin @wholefoods at #OTBC
  • @harrybrelsford: Is Twitter the new Google? That is belief of @erictpeterson Twitter is creating entire new businesses (Flash Light books) #otbc #smbnation

My individual brands of @ronaldbradford and @MySQLExpert will certainly benefit from a wealth of knowledge of the day. If only I had my Twitter name on the tee shirt I was wearing for the event.

The only down sides to the venue the lack of power for attendees, flaky Internet and a basement auditorium with no cell phone service. Important things to re-consider for a online technology conference. In true form the attendees including myself @ronaldbradford, @SBrownCCI, @GeekGirlCamp, @14str8 used the medium of the conference and our voices were heard and some limited power made available. Thanks O’Reilly for listening.

Thank you to all speakers @katmeyer, @timoreilly, @steverubel, @zappos, @carribugbee, @twittermoms, @flwbooks, @davidjdeal, @bethharte, @dunkindonuts, @reggiebradford, @wholefoods, @tedmurphy, @adbroad, @digitalroyalty, @erictpeterson, @mvolpe, @laureltouby, @sarahm and to Zappos.com for the after event happy hour.

Leveraging the power of Twitter

Last week I posted the following twitter request“Can somebody loan me (or buy me) a Dell 2950 decked out so I can run and publish some benchmarks. Please!”

In a same day response I was offered access to use 2 x Dell 1950’s, and today I’m now actually using these machines for my own testing. I would like to thank cafemom (Barry, Anthony & Dan) for the loan of hardware.

And now the chance to better understand the RAID configuration of the DELL PERC Controllers, trying out some different RAID types, LVM configurations and disk tests. When I’m done with my System Administrator refresher, I’m then be trying some different MySQL Benchmarks to test various MySQL configuration settings including using the new Juice benchmark.

Twitter Tips

I have in the past questioned the value of Twitter as an effective business tool, but it continues to defy the trend of inability to bridge the business gap with social media.

Even with still continual growth problems (at least it’s not down as much) Twitter is everywhere I go, see or do. You see it at business events, business cards, meetups even on CNN Headline News. There are so many various differ twitter sites, applications, widgets etc, I’m surprised there isn’t a twitter index just of the twitter related sites.

I have now incorporated Twitter into my professional site and I’m using this micro-blogging approach more to share my professional skills and interests to my growing band of followers. I don’t expect to make the Twitter top list which is headed by CNN Breaking News with 667,353 followers.

Even Lance Armstrong (who rates 9th) used Twitter for press releases this week of his injuries.

For more reading check out How Twitter Makes You A Better Writer and 27 Twitter Applications Your Small Business Can Use Today.

I was surprised to see How to get a job by blogging: Tips for a setting up the kind of professional blog that will get you hired, barely mention Twitter.

Now be sure to add a background appropriate to your Twitter. This one is wicked.

NY Tech 1995-2008. Opening Web 2.0 Expo NY Keynote

Web 2.0 Expo NY keynotes are happening today. Technology in use included CrowdVine which I’d not heard of, and plenty of Twitter feeds such as w2e_NY08.

The opening keynote was Fred Wilson from Union Square Ventures with his presentation New York’s Web Industry From 1995 to 2008: From Nascent to Ascendent .

Some stats, Seed and early stage deals.

  • 1995 230 SF Bay area, 30 in NY
  • 2008 360 SF Bay area, 116 in NY

Fred first asked “New York is not an alley. Call it Broadway, or just New York.”

Here is a summary of his history of New York Web Industry.

  • 1991 – ZDNet
  • 1993 – New York Online Dialup services
  • 1993 – Jupiter Communications online conference
  • 1993 – Prodigy
  • 1994 – Startups such as Pseudo, Total New york, Razorfish.
  • 1994 – Time Warner Pathfinder
  • 1995 -NYIC 55 Broad St. – Technology oriented building
  • 1995 – Seth Godin – Yoyodyne – Permission Marketing
  • 1995 – itraffic, agency.com, NY Times online
  • 1995 – Softbank, Double Click, 24×7, Real Media
  • 1996 – Silicon Alley Reporter
  • 1996 – ivillage, the knot
  • 1996 – Flatiron Partners – good sued for that
  • 1997 – The Silicon Alley Report Radio Show
  • 1997 – mining co.
  • 1997 – Total NY sold to AOL
  • 1997 – Agency rollups razorfish buying 4 companies
  • 1997 – DoubleClick IPO
  • 1998 – Seth Godin moves to Yahoo
  • 1998 – Burn Rate
  • 1998 – Kozmo – We’ll be right over
  • 1998 – was the last year of sanity in the Internet wave
  • 1999 – The start of the boom
  • 1999 the big players came online , all hell breaks loose. 200 startups were funded in 1999, 300 in 2000.
  • 2000 – The Crash & Burn
  • 2000 – f**kedcompany
  • 2000 – Google came to New York. – 86th St Starbucks
  • 2001 – Layoffs, Landlords and bankruptcies
  • 2002 – Rock bottom
  • 2003 – Renewal
  • 2003 – Blogging started gizmodo
  • 2003 – Web 2.0 coined
  • 2003 – del.icio.us was launched from a computer in an apartment
  • 2004 – NY Tech Meetup
  • 2004 – Union Square Ventures $120million raised
  • 2005 – about.com acquired by NY Times
  • 2005 – Etsy
  • 2006 – Google took over port authority building, now with 750 engineers in NY
  • 2008 – Web 2.0 comes to New York City

New York is now 1/3 of Silicon valley, compared to 1/8 of funded Internet companies.

One thing mentioned is a documentary called “We live in Public”. Some of the footage from 1999, is so early Big Brother.