What is the optimal thread specific buffer size?

So you want to know what join_buffer_size, sort_buffer_size, read_buffer_size and read_rnd_buffer_size values for your application should be? These MySQL thread specific buffers are variables I can never get right because there are insufficient metrics, instrumentation or even abstract details with the present MySQL Versions. These are important because Memory is a resource that you want to maximum towards your database data (the System Global Area), and not towards the Process Global Area in which there is no limitations.

I’ve wanted to know this answer for some time, I’ve asked many people including internal MySQL resources, I’d hoped that when joining MySQL more details would be available, but I’ve never been able to get an answer. I’ve always been meaning to work this out, it’s now 2008 and well the time has now come to do something about it.

The questions I want to answer include?

  • When are these buffers used, i.e. which SQL Statements trigger these. Ok, you can work this out when you review the SQL + Schema but I want something that works for clients.
  • I want to know if the buffers are fully allocated or incrementally allocated. My understanding is that for these 4 buffers, there are pre-allocated unlike internal temporary tables, but since reading some of the code I’m no longer completely convinced.
  • I want to know what size of the buffer (if fully allocated) is actually used.

My initial goal is to add the following new status variables.

  • sort_buffer_count, sort_buffer_usage
  • read_buffer_count, read_buffer_usage
  • read_rnd_buffer_count, read_rnd_buffer_usage
  • join_buffer_count, join_buffer_usage

Now, it’s likely there will be some overlap, for example sort_scan, but nothing like introducing some consistency here.

There per session status variables will allow you when reviewing individual SQL statements to see the impact. Ultimately I’d like to add more smarts into MySQL, because things like “gathering all SQL Statements” can be difficult and often not possible in production. Combined with leverage MySQL Proxy we will get there.

I’m looking forward to getting in the bows of the MySQL Source Code, it’s going to be a steep curve as I do little development these days, less in C++ but nothing like a challenge and questions to a lot of close colleagues for small snippets of help.

Tagged with: Databases MySQL

Related Posts

More CPUs or Newer CPUs

In a CPU-bound database workload, regardless of price, would you scale-up or scale-new? What if price was the driving factor, would you scale-up or scale-new? I am using as a baseline the first available AWS Graviton2 processor for RDS (r6g).

Read more

An Interesting Artifact with AWS RDS Aurora Storage

As part of using public datasets with my own Benchmarking Suite I wanted upsize a dataset for larger volume testing. I have always used the INFORMATION_SCHEMA.TABLES data_length and index_length columns as a sufficiently accurate measurement for actual disk space used.

Read more

How long does it take the ReadySet cache to warm up?

During my setup of benchmarking I run a quick test-sysbench script to ensure my configuration is right before running an hour+ duration test. When pointing to a Readyset cache where I have cached the 5 queries used in the sysbench test, but I have not run any execution of the SQL, throughput went up 10x in 5 seconds.

Read more