An Interesting Artifact with AWS RDS Aurora Storage

An Interesting Artifact with AWS RDS Aurora Storage

As part of using public datasets with my own Benchmarking Suite I wanted upsize a dataset for larger volume testing.

I have always used the INFORMATION_SCHEMA.TABLES data_length and index_length columns as a sufficiently accurate measurement for actual disk space used. FWIW table_rows and avg_row can be wildly inaccurate.

As you can see here, you can compare with a self-hosted database the underlying InnoDB Tablespaces align with the provided values (27GB and 3.5G).

Comparing INFORMATION_SCHEMA with individual InnoDB Tablespaces

+----------------------+--------+---------+------------+---------+----------+----------+----------+
| table_name           | ENGINE | format  | TABLE_ROWS | avg_row | total_mb | data_mb  | index_mb |
+----------------------+--------+---------+------------+---------+----------+----------+----------+
| title_principal      | InnoDB | Dynamic |  363705547 |      49 | 27181.00 | 17009.00 | 10172.00 |
| title_episode        | InnoDB | Dynamic |   32860408 |      51 |  3327.00 |  1626.00 |  1701.00 |
+----------------------+--------+---------+------------+---------+----------+----------+----------+

$ ls -lh title_principal.ibd title_episode.ibd
-rw-r----- 1 mysql mysql 3.5G Mar 27 22:00 title_episode.ibd
-rw-r----- 1 mysql mysql  27G Mar 27 23:38 title_principal.ibd

With this verification, I upsized the original IMDb Dataset which was ~20GB, to 4x or ~80GB. In MySQL the results matched my expectations.

IMDb 4x Dataset In MySQL 8.4

This is from a clean load of data and is comparable with a 84GB dataset built up in MySQL 8.0.

+--------------+----------------+----------------+----------------+--------+------------+
| TABLE_SCHEMA | total_mb       | data_mb        | index_mb       | tables | today      |
+--------------+----------------+----------------+----------------+--------+------------+
| imdb         | 83996.53125000 | 48402.57812500 | 35593.95312500 |     13 | 2025-03-28 |
+--------------+----------------+----------------+----------------+--------+------------+

+----------------------+--------+---------+------------+---------+----------+----------+----------+
| table_name           | ENGINE | format  | TABLE_ROWS | avg_row | total_mb | data_mb  | index_mb |
+----------------------+--------+---------+------------+---------+----------+----------+----------+
| title_principal      | InnoDB | Dynamic |  363705547 |      49 | 27181.00 | 17009.00 | 10172.00 |
| title_name_character | InnoDB | Dynamic |  169484096 |      47 | 20769.97 |  7719.00 | 13050.97 |
| title                | InnoDB | Dynamic |   42646882 |     105 |  7706.00 |  4299.00 |  3407.00 |
| name                 | InnoDB | Dynamic |   53712993 |      58 |  6734.98 |  3013.00 |  3721.98 |
| name_profession      | InnoDB | Dynamic |   36270238 |     151 |  6515.00 |  5248.00 |  1267.00 |
| name_known_for       | InnoDB | Dynamic |  104256630 |      48 |  5985.00 |  4853.00 |  1132.00 |
| title_genre          | InnoDB | Dynamic |   63167541 |      74 |  5599.88 |  4457.88 |  1142.00 |
| title_episode        | InnoDB | Dynamic |   32860408 |      51 |  3327.00 |  1626.00 |  1701.00 |
| title_rating         | InnoDB | Dynamic |    6050616 |      30 |   177.67 |   177.67 |     0.00 |
| credit               | InnoDB | Dynamic |         13 |    1260 |     0.02 |     0.02 |     0.00 |
| genre                | InnoDB | Dynamic |         28 |     585 |     0.02 |     0.02 |     0.00 |
| characters           | NULL   | NULL    |       NULL |    NULL |     NULL |     NULL |     NULL |
| popular_titles       | NULL   | NULL    |       NULL |    NULL |     NULL |     NULL |     NULL |
+----------------------+--------+---------+------------+---------+----------+----------+----------+

+----------------------+-------------+
| tbl                  | row_count   |
+----------------------+-------------+
| credit               | 13          |
| genre                | 28          |
| name                 | 56,870,364  |
| name_known_for       | 91,517,992  |
| name_profession      | 63,344,864  |
| title                | 44,458,976  |
| title_episode        | 34,345,816  |
| title_genre          | 69,519,892  |
| title_name_character | 174,171,912 |
| title_principal      | 358,389,012 |
| title_rating         | 6,061,472   |
+----------------------+-------------+

Aurora Storage

However, with AWS RDS Aurora MySQL 8.0 I was surprised to discover the reported disk space is only 60% in comparison. As with a managed service, I have no way to independently verify these numbers as accurate. Aurora is an independent highly-available shared storage layer for RDS Aurora.

This is an interesting fact. If you have a 1TB database in RDS and when you are evaluating migrating to RDS Aurora you would use the same size for calculating storage in RDS Aurora, however this could be an overestimate.

IMDb 4x Dataset in RDS Aurora MySQL 8.0

+--------------+----------------+----------------+----------------+--------+------------+
| TABLE_SCHEMA | total_mb       | data_mb        | index_mb       | tables | today      |
+--------------+----------------+----------------+----------------+--------+------------+
| imdb         | 49865.37500000 | 29458.53125000 | 20406.84375000 |     13 | 2025-03-28 |
+--------------+----------------+----------------+----------------+--------+------------+
1 row in set (0.01 sec)

+----------------------+--------+---------+------------+---------+----------+---------+----------+
| table_name           | ENGINE | format  | TABLE_ROWS | avg_row | total_mb | data_mb | index_mb |
+----------------------+--------+---------+------------+---------+----------+---------+----------+
| title_principal      | InnoDB | Dynamic |  180667894 |      46 | 12864.98 | 8045.00 |  4819.98 |
| title_name_character | InnoDB | Dynamic |   85801233 |      44 |  9892.97 | 3635.00 |  6257.97 |
| name                 | InnoDB | Dynamic |   56778447 |      56 |  7048.98 | 3075.00 |  3973.98 |
| name_profession      | InnoDB | Dynamic |   58811983 |      87 |  6138.98 | 4901.98 |  1237.00 |
| name_known_for       | InnoDB | Dynamic |   90662783 |      53 |  5675.00 | 4599.00 |  1076.00 |
| title                | InnoDB | Dynamic |   22336453 |      93 |  3645.00 | 1996.00 |  1649.00 |
| title_genre          | InnoDB | Dynamic |   29932810 |      72 |  2625.92 | 2055.92 |   570.00 |
| title_episode        | InnoDB | Dynamic |   18534956 |      60 |  1892.91 | 1070.00 |   822.91 |
| title_rating         | InnoDB | Dynamic |    3033301 |      27 |    80.59 |   80.59 |     0.00 |
| credit               | InnoDB | Dynamic |         13 |    1260 |     0.02 |    0.02 |     0.00 |
| genre                | InnoDB | Dynamic |         28 |     585 |     0.02 |    0.02 |     0.00 |
| characters           | NULL   | NULL    |       NULL |    NULL |     NULL |    NULL |     NULL |
| popular_titles       | NULL   | NULL    |       NULL |    NULL |     NULL |    NULL |     NULL |
+----------------------+--------+---------+------------+---------+----------+---------+----------+
13 rows in set (0.00 sec)

+----------------------+-------------+
| tbl                  | row_count   |
+----------------------+-------------+
| credit               | 13          |
| genre                | 28          |
| name                 | 56,942,588  |
| name_known_for       | 91,636,620  |
| name_profession      | 63,419,680  |
| title                | 44,524,244  |
| title_episode        | 34,398,208  |
| title_genre          | 69,632,760  |
| title_name_character | 174,422,448 |
| title_principal      | 358,944,016 |
| title_rating         | 6,075,560   |
+----------------------+-------------+
Tagged with: MySQL RDS Aurora

Related Posts

Easy Money-Saving Tips for Your AWS Cloud Spend

There are numerous Cloud Service Provider (CSP) FinOps products that can review, collate, summarize, and recommend ways to optimize your cloud spend. If you’re using one or more cloud providers and don’t actively manage your Cost and Usage Reports (CURs) on a daily basis, investing in such a product is a smart move.

Read more

How long does it take the ReadySet cache to warm up?

During my setup of benchmarking I run a quick test-sysbench script to ensure my configuration is right before running an hour+ duration test. When pointing to a Readyset cache where I have cached the 5 queries used in the sysbench test, but I have not run any execution of the SQL, throughput went up 10x in 5 seconds.

Read more

Monitoring Latency with Throughput

Higher throughput does not imply improved performance. This is a common problem when the need for an application to support more users, you provide higher concurrency and that appears to show the capability to support higher throughput.

Read more