To enum or not to enum?

Ronald Bradford
January 23, 2006

I’ve never used database columns that embedded defined valid values within the schema definition. Within MySQL there are 2 definitions, ENUM and SET. There are a few reasons why, but first an explanation of these data types.

In summary, using the MySQL Sample Database.
 CREATE TABLE film ( film_id INT UNSIGNED NOT NULL AUTO_INCREMENT, ... rating ENUM('G','PG','PG-13','R','NC-17') DEFAULT 'G', special_features SET('Trailers','Commentaries','Deleted Scenes','Behind the Scenes') DEFAULT NULL, PRIMARY KEY (film_id) )ENGINE=InnoDB DEFAULT CHARSET=utf8; 

So from this, the following commands allow you to inspect this information via mysql.

 DESCRIBE film; SHOW COLUMNS FROM film LIKE 'rating'; 

With the introduction of the INFORMATION_SCHEMA in MySQL 5, a more traditional method using a valid SELECT statement.

 SELECT COLUMN_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA='sakila' AND TABLE_NAME='film' AND COLUMN_NAME = 'rating' 

This is great, however you have to know a lot of information, the table_name and column_name ok, they are fixed, but now the table_schema information is required to be known by the application.

Why I don’t use these?

Historically, there have been 3 reasons why I’ve never used ENUM.

First, it’s not standard SQL (at least to my knowledge), and historically hasn’t been consistent between Database products
More importantly as data, it’s terrible to maintain easily, as it’s within the table structure
The management of valid values within an application, and the need to manage this data dynamically.

Now, I’m not certain if within MySQL there are any funky ways to manage this type of information. I’d welcome comments on what people do. However from my previous experience, this is my method of implementation.

All information of this type, I refer to as a code. I implement this within a common table, defined as reference_data (based historically on the Oracle cg_ref_codes). Here is a cut down version of my definition for simplicity purposes.

 CREATE TABLE reference_data( id INT UNSIGNED NOT NULL AUTO_INCREMENT, type VARCHAR(20) NOT NULL, code VARCHAR(20) NOT NULL, value VARCHAR(100) NOT NULL, PRIMARY KEY (id) )ENGINE=InnoDB DEFAULT CHARSET=utf8; 

Of course, I’ve extended this basic structure to include ordering of data per type, default value of a given type, grouping of types (for larger systems) and a status to manage historically codes no longer valid for new data.

Within tables, I define all codes with an extention of _code so as to clearly identify the content of a given column.

So, for this example I would add data such as:

 INSERT INTO reference_data(type,code,value) VALUES ('rating','G','General Audience - All Ages Admitted'), ('rating','PG','Parental Guidance Suggested. Some Material May Not Be Suitable For Children'), ('rating','PG-13','Parents Strongly Cautioned. Some Material May Be Inappropriate For Children Under 13'), ('rating','R','Restricted. Under 17 Requires Accompanying Parent or Adult Guardian'), ('rating','NC-17','No One 17 And Under Admitted'); 

This for me provides the following benefits:

The Data is data, meaning it can be easily modified via normal application and sql functionality
Additional information such as a more general description can be provided and presented to end users
Additional attributes non shown in this example, provides ordering support, and a default value
The same set of data for a given type can be used in multiple tables

Now, the clear downside of this level of functionality is that the data integrity that is managed by ENUM at the database level has now been lost. You need to temper this with the level of access to your database, and you can easily implement via Database Triggers this level of security. However, I think that clear ability to tie the required values for an ENUM column to easy accessibility via an application, and allow for easy management is clearly a strength.

I’m yet to be convinced otherwise.

Tagged with: Databases General MySQL

MySQL and Heatwave Summit Presentation

Ronald Bradford
April 30, 2025

Last week I had the opportunity to speak at the MySQL and Heatwave Summit in San Francisco. I discussed the impact of the new MySQL 8.0 default caching_sha2_password authentication, replacing the mysql_native_password authentication that was the default for approximately 20 of the 30 years that MySQL has existed.

Readyset QueryPilot Announcement

Ronald Bradford
April 22, 2025

At the MySQL and Heatwave Summit 2025 today, Readyset announced a new data systems architecture pattern named Readyset QueryPilot . This architecture which can front a MySQL or PostgreSQL database infrastructure, combines the enterprise-grade ProxySQL and Readyset caching with intelligent query monitoring and routing to help support applications scale and produce more predictable results with varied workloads.

More CPUs or Newer CPUs

Ronald Bradford
April 2, 2025

In a CPU-bound database workload, regardless of price, would you scale-up or scale-new? What if price was the driving factor, would you scale-up or scale-new? I am using as a baseline the first available AWS Graviton2 processor for RDS (r6g).

To enum or not to enum?

Why I don’t use these?

Related Posts

MySQL and Heatwave Summit Presentation

Readyset QueryPilot Announcement

More CPUs or Newer CPUs