To enum or not to enum?

I’ve never used database columns that embedded defined valid values within the schema definition. Within MySQL there are 2 definitions, ENUM and SET. There are a few reasons why, but first an explanation of these data types.

In summary, using the MySQL Sample Database.
<br /> CREATE TABLE film (<br /> film_id INT UNSIGNED NOT NULL AUTO_INCREMENT,<br /> ...<br /> rating ENUM('G','PG','PG-13','R','NC-17') DEFAULT 'G',<br /> special_features SET('Trailers','Commentaries','Deleted Scenes','Behind the Scenes') DEFAULT NULL,<br /> PRIMARY KEY (film_id)<br /> )ENGINE=InnoDB DEFAULT CHARSET=utf8;<br />

So from this, the following commands allow you to inspect this information via mysql.

<br /> DESCRIBE film;<br /> SHOW COLUMNS FROM film LIKE 'rating';<br />

With the introduction of the INFORMATION_SCHEMA in MySQL 5, a more traditional method using a valid SELECT statement.

<br /> SELECT COLUMN_TYPE<br /> FROM INFORMATION_SCHEMA.COLUMNS<br /> WHERE TABLE_SCHEMA='sakila'<br /> AND TABLE_NAME='film'<br /> AND COLUMN_NAME = 'rating'<br />

This is great, however you have to know a lot of information, the table_name and column_name ok, they are fixed, but now the table_schema information is required to be known by the application.

Why I don’t use these?

Historically, there have been 3 reasons why I’ve never used ENUM.

  • First, it’s not standard SQL (at least to my knowledge), and historically hasn’t been consistent between Database products
  • More importantly as data, it’s terrible to maintain easily, as it’s within the table structure
  • The management of valid values within an application, and the need to manage this data dynamically.

Now, I’m not certain if within MySQL there are any funky ways to manage this type of information. I’d welcome comments on what people do. However from my previous experience, this is my method of implementation.

All information of this type, I refer to as a code. I implement this within a common table, defined as reference_data (based historically on the Oracle cg_ref_codes). Here is a cut down version of my definition for simplicity purposes.

<br /> CREATE TABLE reference_data(<br /> id INT UNSIGNED NOT NULL AUTO_INCREMENT,<br /> type VARCHAR(20) NOT NULL,<br /> code VARCHAR(20) NOT NULL,<br /> value VARCHAR(100) NOT NULL,<br /> PRIMARY KEY (id)<br /> )ENGINE=InnoDB DEFAULT CHARSET=utf8;<br />

Of course, I’ve extended this basic structure to include ordering of data per type, default value of a given type, grouping of types (for larger systems) and a status to manage historically codes no longer valid for new data.

Within tables, I define all codes with an extention of _code so as to clearly identify the content of a given column.

So, for this example I would add data such as:

<br /> INSERT INTO reference_data(type,code,value) VALUES<br /> ('rating','G','General Audience - All Ages Admitted'),<br /> ('rating','PG','Parental Guidance Suggested. Some Material May Not Be Suitable For Children'),<br /> ('rating','PG-13','Parents Strongly Cautioned. Some Material May Be Inappropriate For Children Under 13'),<br /> ('rating','R','Restricted. Under 17 Requires Accompanying Parent or Adult Guardian'),<br /> ('rating','NC-17','No One 17 And Under Admitted');<br />

This for me provides the following benefits:

  • The Data is data, meaning it can be easily modified via normal application and sql functionality
  • Additional information such as a more general description can be provided and presented to end users
  • Additional attributes non shown in this example, provides ordering support, and a default value
  • The same set of data for a given type can be used in multiple tables

Now, the clear downside of this level of functionality is that the data integrity that is managed by ENUM at the database level has now been lost. You need to temper this with the level of access to your database, and you can easily implement via Database Triggers this level of security. However, I think that clear ability to tie the required values for an ENUM column to easy accessibility via an application, and allow for easy management is clearly a strength.

I’m yet to be convinced otherwise.

Tagged with: Databases General MySQL

Related Posts

Why Being Proactive Is Always a Winning Approach

Many companies manage production infrastructure using a reactive model rather than a proactive one. Organizations typically react to warnings and alerts, then implement corrective actions in response. While some companies have well-designed architectural patterns—such as feature flags and rate limiting—that can quickly mitigate the impact of issues, these are merely temporary solutions, not resolutions.

Read more

AWS CLI support for Aurora DSQL and S3 Tables

If you were following the AWS Re:invent keynote yesterday there were several data specific announcements including Aurora DSQL and S3 Tables . Wanting to check them out, I downloaded the latest AWS CLI 2.

Read more

Migrating off of WordPress - A Simplified Stack

The ongoing drama between Wordpress v WP Engine continues to cross my reading list, but I have permanently removed WordPress from my website. I have finally transitioned away from the complex Linux/Apache/MySQL/PHP (LAMP) stack required for self-hosting WordPress on my professional website.

Read more