The database frontier

Jay’s opening lines regarding the final MySQL Conference keynote speaker was: “I work with a lot of data. I think peta-bytes, maybe exa-bytes”. This was relating to Jacek Becla from the Stanford Linear Accelerator Center, giving his presentation on “The Science and Fiction of Petascale Analytics”.

The goal of the Large Synoptics Survey Telescope (LSST) is the storage of 50+ PB of images and 20+ PB data.
Let’s just clarify the size. 20 PB of data = 20 years of HD Movies = 2000 years of 128kb MP3

The next database frontier is obviously building huge databases. What part will MySQL or other relational databases play? Some interesting facts were.

  • The Digital Universe Created 161 Exabytes of data last year.
  • Google, processes 20 petabytes of data per day.

The Operational plan for LSST Project Timeline is 10 years, only starting in 2014. The timeline:

  • 2009 Choosing Technology
  • 2010-2014 constructions
  • 2014-2023 production

The primary goals are: Scale, parallelize, fault tolerant.

Tagged with: Databases MySQL MySQL User Conferences MySQL Users Conference 2008

Producing Skewness statistics with SQL

Skewness measures the asymmetry of a distribution. A perfectly symmetric distribution has a skewness of zero. A positive skew (right-skewed) means the tail extends to the right — a small number of high values pull the mean above the median.

Exploring the vsql-ai extension

The vsql-ai extension adds AI prompt capabilities and text embeddings directly in SQL queries, with support for Anthropic Claude , Google Gemini , OpenAI ChatGPT , or a local LLM such as Ollama .

Producing Chi-Squared statistics with SQL

The Chi-Squared test is one of the most widely used statistical tests for categorical data. It comes in two flavors: the goodness-of-fit test asks whether an observed frequency distribution matches an expected one, while the test of independence asks whether two categorical variables are associated with each other.