It’s important that for any software application good standards exist. Standards ensure a number of key considerations. Standards are necessary to enforce and provide reproducible software and to provide a level of quality in a team environment, ease of readability and consistency.
If you were going to create a MySQL Naming Standard you have to make a number of key decisions. Generally there is no true right or wrong, however my goals tend towards readability and simplicity. In 2 decades of database design I’ve actually changed my preference between some of these points.
1. Pluralism
Option 1
All database objects are defined in the logical form, that being singular.
For example: box, customer, person, category, user, order, order_line product, post, post_category
Option 2
For database tables & views, objects are defined in plural. For columns, objects are defined singular.
For example: boxes, customers, people, categories, users, orders, order_lines, products, posts, post_categories
Issues
Inconsistency between table name and column name, when using plural. Column names simply are not plural.
When the plural of the name is a completely different spelled word. For example a table of People, and a primary key of PersonId.
When the plural rule is not adding ‘s’, for example replacing ‘y’ with ‘ies’ as with Category.
Strict rule necessary for relationship and intersection tables. Generally, only the last portion is plural.
What about other objects, such as stored procedures for example.
2. Case Sensitivity
Option 1
All database objects should be specified as lowercase only. Words are separated with ‘underscores’.
For example: customer, customer_history, order, order_line, product, product_price, product_price_history
Option 2
All database objects use CamelCase. Words are separated via Case.
For example: Customer, CustomerHistory, Order, OrderLine, Product, ProductPrice, ProductPriceHistory
Issues
Some database products have restrictions here, Oracle for example, UPPERCASES all objects. MySQL allows for both, except for some Operating Systems that does not support Mixed Case properly (e.g. Microsoft)
3.Key names
In a related post I will be discussing natural and surrogate keys. For this purpose, we will assume you are using surrogate keys.
All keys will have a standard name. For Example: id, key, sgn (system global number)
When referencing primary keys and foreign keys, what standard is used?
Option 1
The primary key is the same name across all tables, making it easy to know the primary key of a table.
For foreign keys, the name is prefixed with the table name or appropriate table alias.
For Example: id. The foreign key would be customer_id, order_id, product_id
Option 2
The primary key is defined as unique across the system, and as such the foreign key is the same as the primary key.
For Example: customer_id, order_id, product_id
Issues
When self referencing columns to a table, having a standard is also appropriate, for example parent_product_id.
4. Specific Object Names
Option 1
For the given type of object, have a standard prefix or suffix.
For Example, all tables are prefixed with tbl, all views are suffixed with _view, all columns for a given table are prefixed with table alias.
Option 2
Don’t prefix or suffix an object with it’s object type.
5. Reserved Words
Option 1
Don’t use them. When a word is reserved, find a more description name.
E.g. system_user, order_date,
Option 2
Allow them.
For Example: user, date, group, order
Issues
A number of database systems do not allow the use of reserved words, or historically have not. Sum such as MySQL for example, allow reserved words, but only when additional quoting is used.
6. Abbreviations
This indeed could be a entire topic it’s self. In simplicity, do you use abbreviations other then the most common and everybody knows abbreviations (to the point you don’t know what the abbreviation actually is in real life type abbreviations), or do you not.
I use the example of ‘url’. How many people actually know what url stands for. This is a common abbreviation.
Option 1
For objects, use abbreviations when possible. Don’t use it for key tables, but for child and intersection tables.
For Example: invoice, inv_detail, inv_id,
Option 2
Avoid abbreviations at all costs.
invoice, invoice_detail, invoice_id,
Issues
Unless you have a large schema (e.g. > 500 tables) the use of abbreviations should not be needed given the relative sizes of objects in modern databases.
My Recommendations
There are a lot more considerations then these few examples for naming standards, however as with every design, it’s important to make a start and work towards continual improvement.
- All objects are singular (very adamant, people/person, category/categories, sheep/sheep for tables, but not columns – simplicity wins as English is complex for plurals).
- All objects are lowercase and use underscore ‘_’ (I really like CamelCase for readability, but for consistency and simplicity, unfortunately lowercase is easier).
- All primary key’s are defined as unique across the system.
- Don’t use prefixes/suffixes to identity object types.
- Never use reserved words.
- Don’t use abbreviations except for the most obvious.
At the end of the day, I will work with what standard is in place. What I won’t work with is, when there is no documented, accountable standard.
Arjen Lentz says
Excellent post, thanks. Useful to see all the options expanded on.
There are of course the conventions used by for instance Ruby on Rails; rails has a complete pluraliser built-in so it can actually work things out and do “the right thing”, automatically. Having the pluralisation-issue out of the way, rails uses singular for columns and plural for tables, which I do think makes a lot of sense when reviewing a schema (but I also agree with you that it can be a pest with English linguistic quirks and typo).
One question. Item 1 in your recommendations is not entirely clear to me, you say you want all objects as singular but then note category/categories? and say something about “not columns”. Can you rephrase for disambiguation? cheers
Bill Karwin says
Hi Ronald, have you checked out Joe Celko’s book “SQL Programming Style”? Celko summarizes metadata naming guidelines from ISO-11179.
I like your recommendations except that I make tables plural, while columns are singular (except where its a measurement e.g. “ounces”).
As for procedures, I don’t think you need to worry about singular vs. plural because procedures should be based on a verb, not a noun.
Roland Bouman says
Hi!
No doubt a topic that is guaranteed to lead to a scroll of yes/no arguments Anyway, I think I agree to most of your recommendations, but:…
“Column names simply are not plural.”
Like Bill mentioned, sometimes the column conveys a datum that denotes some amount, quantity or number of occurrences. Personally I would not use his example “ounces” (I would call that “weight” and document that the unit of measurement is “ounce”), but there are cases, especially where the datum is some kind of count where I wouldn’t object against a plural:
vehicle_type (
…
, wheels tinyint unsigned
…
)
“wheel_count” or “wheel_quantity” just sound strange to me, although I can imagine using “number_of_wheels” to get a singluar (“number_of_…”) after all.
I don’t really like your naming schemes for foreign key columns. My main gripe is that these don’t deal rigorously with multiple foreign keys between two tables, for example as in sakila.film (language_id, original_language_id), and I don’t like thinking of an ad-hoc name when you add a foreign key later on. The real solution to this problem would be to always use the role name (rather than a table name) for the foreign key, and derive the foreign key column from the foreign key name. Example:
purchase_order (
…
, customer_address_id int unsigned not null
, shipping_address_id int unsigned not null
, billing_address_id int unsigned not null
…
, constraint fk_purchase_order_customer_address (customer_address_id) references address (address_id)
, constraint fk_purchase_order_shipping_address (shipping_address_id) references address (address_id)
, constraint fk_purchase_order_billing_address (billing_address_id) references address (address_id)
)
Your example of self-referencing tables is just special case where you get into trouble. I mean, using ‘parent’ like you did may adequately describe a role name, but there are cases where multiple hierarchies may exist within the same table, so that you still need to qualify the different parents:
employee (
…
, hierarchical_manager_id — the traditional ‘real’ manager that actually ‘outranks’ this employee
, operational_manager_id — the manager that directly orders this employee (which may or may not actually ‘outrank’ this employee
, functional_manager_id — the manager that can advise this employee and is responsible for its functional performance (functional supervisor)
…
)
(for a more tangible example, think of a genealogy database where each person has a fk to its mother and to its father)
kind regards,
Roland
Bill Karwin says
I agree that more descriptive foreign key names are generally better, but I think the point is that one should not call the primary key in every table “id”. Using a more descriptive and unique name for each primary key is useful, and it also neatly avoids some column name collisions in join queries.
Roland Bouman says
Hi Bill, Ronald,
“one should not call the primary key in every table “id””
…
“it also neatly avoids some column name collisions in join queries”
Well, to some extent I agree: by definition, prefixing columns lessens the number of possible collisions. However, for me in practice, collisions don’t pose any problem to me as I always use table aliases and qualified columns anyway. I’m aware you could argue that that’s a chicken-egg situation (“I’m forced to using them, that’s why I use them”), but I think there is another issue at hand here.
Let me put it this way: suppose I would prefix the primary key column with the table name, and make them like %table_name%_id. Now, this will allow me to SELECT primary key columns from several tables, and they won’t collide. However, I usually find I don’t want to select those – meaningless, surrogate – primary key columns. What I do want to select a good deal more often are names, dates, descriptions, and codes.
So I guess my question is, why make such a special case out of primary key columns? If you are determined to prevent these name collisions, shouldn’t you be prefixing *all* columns? For example, all those columns called “name” and “status” and “description” etc. ? It seems to me you should either prefix none of them, or all of them.
And, if you decide to prefix all your columns, is that really practically different from what I do? I prefix all my columns too – I just happen to prefix them with table qualifiers.
I can think of just one reason why you would want to use prefixed primary key columns. If you are fond of the “named columns join” (the one with USING) then it’s convenient to give foreign key columns the same name as their corresponding primary key column. However, this then forces you to write similar joins over distinct foreign keys between the same two tables using different methods: one using the named columns join (for the one foreign key that could be named using the same name as the primary key) and one with the regular explicit qualified join (with ON). I understand that many people like the named columns join despite this, but I really dislike having to solve the exact same problem with two different syntaxes.
The only ‘fancy’ join method that would make sense to me is one that would allow you to somehow use the foreign key constraint name – something like:
FROM fk_table PARENT() IN pk_table
and
FROM pk_table CHILDREN() IN fk_table
that method would work for any imaginable foreign key without the need for special naming schemes, and it would also be resilient to any structural changes to the key.
Once we’d have the possibility to refer to entire constraints, why not allow references to them for other clauses too, like:
GROUP BY KEY()
or
GROUP BY KEY(, )
(PS1: the syntax is not important, the concept of using key names to denote a set of columns is the main thing)
(PS2: not trolling just offering food for thought – this is one of those topics where I don’t wish to pretend there’s one right way)
Roland Bouman says
window.status = document.cookie
Roland Bouman says
Darn, i had a bunch of <constraint-name> thingies between the parenthesis in my syntax examples, but those get stripped out…
So, I meant:
FROM fk_table PARENT(<fk-constraint-name>) IN pk_table
FROM pk_table CHILDREN(<fk-constraint-name>) IN fk_table
and
GROUP BY KEY(<table-name>)
GROUP BY [UNIQUE | FOREIGN] KEY(<table-name>, <constraint name>)
Bill Karwin says
Yes, one reason to have descriptive primary key columns is to allow the use of a USING() clause. This is a concise way to specify join conditions, but you’re right that it only works if the column names are the same in both tables, and you’re doing an equi-join. Isn’t this a common enough kind of join that it’d be handy to have some shorthand?
You can also omit the column name in a foreign key declaration if the columns have the same name. It’s no big deal, it’s just a small convenience.
Another reason to eschew the ever-present “id” column is that it conveys no meaning except for the fact that it’s by convention a primary key. Using a more descriptive column name helps readability. In cases where the primary key is a natural key instead of a pseudokey, you should name the column to describe its contents.
I don’t think it’s necessary to prefix all columns, because they’re not nearly as likely to collide. But if every table names its primary key the same thing, they are certain to collide.
Nor do I advocate always using the table name as a prefix in the primary key column’s name. For one thing, it shouldn’t be plural (e.g. Categories.category_id is preferable to Categories.categories_id). It also doesn’t need to be based exactly on table name in all cases.
Anyway, all I meant to say is that “id” isn’t a good choice. Frameworks (like Rails) that insist that every table must have a pseudokey named “id” should be considered harmful!
I just helped a user today who had defined a many-to-many table as follows:
CREATE TABLE (
id INTEGER AUTO_INCREMENT PRIMARY KEY,
article_id INTEGER REFERENCES Articles(id),
keyword_id INTEGER REFERENCES Keywords(id)
);
Ouch!
Bill Karwin says
Your idea about using constraint names is interesting. It’s nice that it supports tables that reference a parent table using multiple foreign keys, so the columns are bound to use different names.
But unfortunately, your syntax is not currently part of the SQL standard nor is it implemented in any SQL parser, as far as I know. And it assumes that the storage engine supports named DRI constraints.
Roland Bouman says
Hi Bill!
(heh. it seems we really do disagree Anyway, it’s fun and interesting to hear your point of view. Comments here to clarify mine – not to bash yours)
“Isn’t this a common enough kind of join that it’d be handy to have some shorthand”
Well, it means in a certain way that you are prepared to say: “of these 2 foreign keys from table A to table B, I think this one here is special enough to give its foreign key column the same name as the pk column of the parent table.”
“it conveys no meaning except for the fact that it’s by convention a primary key”
Yes. But in a way, that is exactly why I like it better than _id. Let me explain. Suppose I see a table called foo, having a name and a description and also the much dreaded id columns. Then I look at it as:
“name in foo, that must be the name of one foo instance, description in foo, that must describe the name of one foo instance, id – darn, that must be the surrogate key that identifies one foo instance”
In your naming scheme, the table would be called “foos” rather than “foo”, and the surrogate primary key would be called “foo_id”. When I look at that I’d say:
“name in foos, that is the name of one foo instance, description: description of one foo instance, foo_id that’s the foo_id of one foo instance – oh no, that must be the name of the id of one foo instance”
So to me, adding “foo” as a prefix is just redundant – to me it’s understood that the foo.id is the id of a foo just like foo.name is the name of a foo.
“I don’t think it’s necessary to prefix all columns, because they’re not nearly as likely to collide. But if every table names its primary key the same thing, they are certain to collide.”
Well, “…not as nearly…” means you’d be needing qualified columns in your queries at least some of the time. I argued in my previous comment that practically, there is not so much difference in prefixing the columns in the table def as compared to qualifying them in the query itself (which is what I prefer). Like I said, it’s almost never the id columns that collide in SELECT lists – I am more interested in the non-id columns there than in the id columns (but it doesn’t matter anyway because I always use qualified columns)
“Nor do I advocate always using the table name as a prefix in the primary key column’s name. ”
Well, that would depend on your table naming schema – plural or singular. To me, the plural doesn’t convey any information – its understood the table contains a multiple of <table-name> instances, so all the plural does is lengthen the identifiers. Add in the fact that I can’t easily mechanically derive the plural from the singular, it’s a pretty obvious choice to me – If i have the choice I stick with singular, even if that means living on the edge and defying any ISO standard or Celko book.
“I just helped a user today who had defined a many-to-many table as follows:”
Heh, I recall we discussed this earlier. Actually, this is exactly what I do, and for a reason. In an evolving schema, it’s not uncommon for a many to many relationship to gain attributes of its own, or even at some point to grow children of its own (at which point it apparently was more than just a relationship – apparently it was actually an entity of itself) Rather than adding the surrogate id later on, I do so up front. Having an extra id typically doesn’t hurt in case you really do have a pure intersection table (there may be exceptions if you are dealing with really, really large data sets).
“unfortunately, your syntax is not currently part of the SQL standard nor is it implemented in any SQL parser”
True. The ppl at the standards committee were too busy inventing NATURAL joins too listen to me :p
“it assumes that the storage engine supports named DRI constraints”
Not necessarily. It’s enough for the server to remember all metadata, and use it. It does not actually have to enfore it to look up how to expand this syntax into regular conditions.
regards,
Roland
Rob Wultsch says
Where is the love for Hungarian notation on column names? #4 is some what similar but would not get you:
cNameL
iAmount
dDatetime
(this lovely notation is enforced where I work, and those columns names are not made up)…
Roland Bouman says
Hi Rob,
this looks like it was created by a laid off VB programmer that moved on to pursue a career in physical database design.
What’s the L doing in cNameL ?
regards, Roland
Rob Wultsch says
Hi Roland,
The L in cNameL is for last name. Also worth note is that dDatetime is actually of the type TIMESTAMP (IIRC).
How awesome is that?
Mau says
Hello,
I really liked the article.
In object oriented language context and supposing that you are going to map every table/entity to a class, would you still propose the same?
2. Camel Case might be better?
3. using just ‘id’ everywhere might be appealing.
Any comment on that?
Thank you.
Aditya says
find it confusing
1. Pluralism
Option 1
All database objects are defined in the logical form, that being singular.
For example: box, customer, person, category, user, order, order_line product, post, post_category
Option 2
For database tables & views, objects are defined in plural. For columns, objects are defined singular.
For example: boxes, customers, people, categories, users, orders, order_lines, products, posts, post_categories
With my table name be box or boxes , customer or customers?
sadly english is not first language!