Good Test Data

Ronald Bradford
January 28, 2014

Over the years you collect datasets you have created for various types of testing, seeding databases etc. I have always thought one needs to better manage this for future re-use. Recently I wanted to do some “Big Data” playing and again that question of what datasets can I use let me to review the past collated list at Seeking public data for benchmarks .

The types of things I was wanting to do lead me to realize a lot of content is “public domain” and Project Gutenberg is just one great source of text in multiple languages. This was just one aspect of my wish list but text based data is used from blogs, comments, articles, microblogs etc, and multiple languages was important from some text analysis.

With a bit of thinking about the building blocks, I created Good Test Data . A way for me to have core data, IP’s, people’s names, User Agents strings, text for articles, comments and a lot more. And importantly the ability to generate large randomized amounts of this data quickly and easily.

Now I can build a list of 1 million random names with unique usernames and emails with ease. I can generate millions of varying articles, from a short microblog, a comment, a blog to a multi page article. Then be able to produce HTML/PDF/PNG versions giving me file attachments. I’ve been playing more with image generation, creating banner images with varying text, and now I’m generating MP4 video to simulate the various standard sizes for advertising and just to see what people need.

I’m not sure of the potential use and benefit for others and that wasn’t the primary goal, however I would like to know how these building blocks could be used. The data is relatively agnostic, being able to easily load into MySQL tables. Depending on demand, being able to create pre-configured open source product data for e-commence products, CRM or blogging are all possible options.

Tagged with: Databases MySQL

A first look at MySQL 26.7 Early Access

Ronald Bradford
July 23, 2026

MySQL has dropped its newest release , categorized as “Early Access” and available at https://labs.mysql.com/ . While this post is not going to go into depth, I wanted to at least validate the management changes you verify between normal MySQL upgrades.

Where is the technology breakdown? Can AI help?

Ronald Bradford
July 22, 2026

On a major financial institution website I was asked to complete a contact form. This organization has millions of existing customers. This is not a startup, yet the quality of work is something a junior developer would fail at an interview if they provided the answer.

Why My Mac Was Not Using Post-Quantum SSH With GitHub (And How I Fixed It)

Ronald Bradford
July 21, 2026

In my previous post I made the case that the only post-quantum protection that counts is the algorithm your connection actually negotiates. This post is what happened when I checked my own laptop.

Good Test Data

Related Posts

A first look at MySQL 26.7 Early Access

Where is the technology breakdown? Can AI help?

Why My Mac Was Not Using Post-Quantum SSH With GitHub (And How I Fixed It)