Good Test Data

Ronald Bradford
January 28, 2014

Over the years you collect datasets you have created for various types of testing, seeding databases etc. I have always thought one needs to better manage this for future re-use. Recently I wanted to do some “Big Data” playing and again that question of what datasets can I use let me to review the past collated list at Seeking public data for benchmarks .

The types of things I was wanting to do lead me to realize a lot of content is “public domain” and Project Gutenberg is just one great source of text in multiple languages. This was just one aspect of my wish list but text based data is used from blogs, comments, articles, microblogs etc, and multiple languages was important from some text analysis.

With a bit of thinking about the building blocks, I created Good Test Data . A way for me to have core data, IP’s, people’s names, User Agents strings, text for articles, comments and a lot more. And importantly the ability to generate large randomized amounts of this data quickly and easily.

Now I can build a list of 1 million random names with unique usernames and emails with ease. I can generate millions of varying articles, from a short microblog, a comment, a blog to a multi page article. Then be able to produce HTML/PDF/PNG versions giving me file attachments. I’ve been playing more with image generation, creating banner images with varying text, and now I’m generating MP4 video to simulate the various standard sizes for advertising and just to see what people need.

I’m not sure of the potential use and benefit for others and that wasn’t the primary goal, however I would like to know how these building blocks could be used. The data is relatively agnostic, being able to easily load into MySQL tables. Depending on demand, being able to create pre-configured open source product data for e-commence products, CRM or blogging are all possible options.

Tagged with: Databases MySQL

MySQL and Heatwave Summit Presentation

Ronald Bradford
April 30, 2025

Last week I had the opportunity to speak at the MySQL and Heatwave Summit in San Francisco. I discussed the impact of the new MySQL 8.0 default caching_sha2_password authentication, replacing the mysql_native_password authentication that was the default for approximately 20 of the 30 years that MySQL has existed.

Readyset QueryPilot Announcement

Ronald Bradford
April 22, 2025

At the MySQL and Heatwave Summit 2025 today, Readyset announced a new data systems architecture pattern named Readyset QueryPilot . This architecture which can front a MySQL or PostgreSQL database infrastructure, combines the enterprise-grade ProxySQL and Readyset caching with intelligent query monitoring and routing to help support applications scale and produce more predictable results with varied workloads.

More CPUs or Newer CPUs

Ronald Bradford
April 2, 2025

In a CPU-bound database workload, regardless of price, would you scale-up or scale-new? What if price was the driving factor, would you scale-up or scale-new? I am using as a baseline the first available AWS Graviton2 processor for RDS (r6g).

Good Test Data

Related Posts

MySQL and Heatwave Summit Presentation

Readyset QueryPilot Announcement

More CPUs or Newer CPUs