The database frontier

Jay’s opening lines regarding the final MySQL Conference keynote speaker was: “I work with a lot of data. I think peta-bytes, maybe exa-bytes”. This was relating to Jacek Becla from the Stanford Linear Accelerator Center, giving his presentation on “The Science and Fiction of Petascale Analytics”.

The goal of the Large Synoptics Survey Telescope (LSST) is the storage of 50+ PB of images and 20+ PB data.
Let’s just clarify the size. 20 PB of data = 20 years of HD Movies = 2000 years of 128kb MP3

The next database frontier is obviously building huge databases. What part will MySQL or other relational databases play? Some interesting facts were.

The Digital Universe Created 161 Exabytes of data last year.
Google, processes 20 petabytes of data per day.

The Operational plan for LSST Project Timeline is 10 years, only starting in 2014. The timeline:

2009 Choosing Technology
2010-2014 constructions
2014-2023 production

The primary goals are: Scale, parallelize, fault tolerant.

Comments

Arjen Lentz says

April 22, 2008 at 7:05 pm

Does it even make sense to stick this in an SQL/RDBMS?
Fran says

April 22, 2008 at 8:41 pm

I agree with Arjen, maybe such vast amounts of data will maybe push relational
databases beyond their real limits (of course, rdbms are going to keep
evolving).
Without getting too “googly” :-] I think the future of massive data processing
will involve drastic architectural reengineering in terms of data
(de)structutration and paralelization.