The answer is yes.
In this face off we have two numeric MySQL data types, both Integer. In fact MySQL has 9 different numeric data types for integer, fixed precision and floating point numbers, however we are just going to focus on two, BIGINT and INT. This design consideration is part of my recent presentation Top 20 Design Tips for Data Architects.
What is the difference?
We turn to the MySQL Reference Manual first, in 10.1.1. Overview of Numeric Types we see the following.
INT[(M)] [UNSIGNED] [ZEROFILL]
A normal-size integer. The signed range is -2147483648 to 2147483647. The unsigned range is 0 to 4294967295.
BIGINT[(M)] [UNSIGNED] [ZEROFILL]
A large integer. The signed range is -9223372036854775808 to 9223372036854775807. The unsigned range is 0 to 18446744073709551615.
Ok, well an INT can store a value to 2.1 Billion, and an a BIGINT can store a value to some larger number to 20 digits. That MySQL search didn’t help much with details, we have to dig deeper to find 10.2. Numeric Types in which we find that INT is a 4 byte integer, and a BIGINT is an 8 byte integer.
So what’s the big deal?
Quite a lot actually. Using INT rather then BIGINT can make a significant reduction in disk space. Just this one change alone can save you 10%-20% (depends on your particular situation). More significantly, when used as a primary key, and for foreign keys and indexes, reducing your index size could be 50%, and this will improve performance when these indexes are used.
My approach is this. Let’s just focus on primary keys and foreign keys to begin with. Are you going to store more then 2.1 Billion rows in your table? The answer should be no? Should you say yes, then you do have grand plans, but you are also failing to consider the ramifications of handling larger data sets (a topic for later discussion).
There are exceptions to this rule, if you do a huge number of inserts and deletes, then while you may not have 2.1 Billion rows, you may have done 2.1 Billion inserts. Again better design practices should be considered in this case.
The Test
As with everything, we need some evidence to stake the claim. Using the Sakila sample database.
We start with a simple intersection table, that has a high number of numeric only columns. This will show the best case situation.
We will create two tables, one with all BIGINT columns, and one with all INT columns and then compare the size. These tables are only small, but they show the proportion of savings of disk space.
CREATE TABLE inventory_bigint LIKE inventory; ALTER TABLE inventory_bigint MODIFY inventory_id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, MODIFY film_id BIGINT UNSIGNED NOT NULL, MODIFY store_id BIGINT UNSIGNED NOT NULL; INSERT INTO inventory_bigint SELECT * from inventory; CREATE TABLE inventory_int LIKE inventory; ALTER TABLE inventory_int MODIFY inventory_id INT UNSIGNED NOT NULL AUTO_INCREMENT, MODIFY film_id INT UNSIGNED NOT NULL, MODIFY store_id INT UNSIGNED NOT NULL; INSERT INTO inventory_int SELECT * from inventory;
select table_name,engine,row_format, table_rows, avg_row_length, (data_length+index_length)/1024/1024 as total_mb, (data_length)/1024/1024 as data_mb, (index_length)/1024/1024 as index_mb from information_schema.tables where table_schema='sakila' and table_name LIKE 'inventory%' order by 6 desc; +------------------+--------+------------+------------+----------------+-------------+-------------+-------------+ | table_name | engine | row_format | table_rows | avg_row_length | total_mb | data_mb | index_mb | +------------------+--------+------------+------------+----------------+-------------+-------------+-------------+ | inventory_bigint | InnoDB | Compact | 293655 | 51 | 43.60937500 | 14.51562500 | 29.09375000 | | inventory_int | InnoDB | Compact | 293715 | 37 | 29.54687500 | 10.51562500 | 19.03125000 | | inventory | InnoDB | Compact | 293707 | 33 | 22.54687500 | 9.51562500 | 13.03125000 | +------------------+--------+------------+------------+----------------+-------------+-------------+-------------+ 3 rows in set (0.15 sec)
In this example, the data portion decreased from 14MB to 10MB or 28%, and the index portion from 29M to 19M or 34%.
CREATE TABLE customer_bigint LIKE customer; ALTER TABLE customer_bigint MODIFY customer_id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, MODIFY store_id BIGINT UNSIGNED NOT NULL, MODIFY address_id BIGINT UNSIGNED NOT NULL, MODIFY active BIGINT UNSIGNED NOT NULL; CREATE TABLE customer_int LIKE customer; ALTER TABLE customer_int MODIFY customer_id INT UNSIGNED NOT NULL AUTO_INCREMENT, MODIFY store_id INT UNSIGNED NOT NULL, MODIFY address_id INT UNSIGNED NOT NULL, MODIFY active INT UNSIGNED NOT NULL; select table_name,engine,row_format, table_rows, avg_row_length, (data_length+index_length)/1024/1024 as total_mb, (data_length)/1024/1024 as data_mb, (index_length)/1024/1024 as index_mb from information_schema.tables where table_schema='sakila' and table_name LIKE 'customer%' order by 6 desc; +-----------------+--------+------------+------------+----------------+-------------+-------------+-------------+ | table_name | engine | row_format | table_rows | avg_row_length | total_mb | data_mb | index_mb | +-----------------+--------+------------+------------+----------------+-------------+-------------+-------------+ | customer_bigint | InnoDB | Compact | 154148 | 139 | 37.09375000 | 20.54687500 | 16.54687500 | | customer_int | InnoDB | Compact | 151254 | 121 | 30.06250000 | 17.51562500 | 12.54687500 | | customer | InnoDB | Compact | 37684 | 125 | 7.81250000 | 4.51562500 | 3.29687500 | | customer_list | NULL | NULL | NULL | NULL | NULL | NULL | NULL | +-----------------+--------+------------+------------+----------------+-------------+-------------+-------------+ 4 rows in set (0.22 sec)
In this example, the data portion decreased from 20MB to 17MB or 15%, and the index portion from 16M to 12M or 25%.
NOTE: The sample data set was increased for this example.
Conclusion
Even with these simple tables and small data sets it’s clear that INT is a saving of diskspace over BIGINT. In many clients I’ve seen huge savings in multi TB databases, just with a small number of schema optimizations. If this saving alone for a more optimized database design was only 10%, it is an easy 10% that will reflect a direct improvement in performance.
About the Author
Ronald Bradford provides Consulting and Advisory Services in Data Architecture, Performance and Scalability for MySQL Solutions. An IT industry professional for two decades with extensive database experience in MySQL, Oracle and Ingres his expertise covers data architecture, software development, migration, performance analysis and production system implementations. His knowledge from 10 years of consulting across many industry sectors, technologies and countries has provided unique insight into being able to provide solutions to problems. For more information Contact Ronald.