Improving Open Source Databases – WordPress

As part of both my UltimateLAMP side project, and my greater involvement with MySQL Workbench, I’ve been wanting to review and document the database schemas of Open Source products.

Indeed, as part of discussions with Mike Zinner of MySQL AB at the recent MySQL Users Conference, I suggested an idea used in the Java World, for example by Clover and Agitar, where to promote the usefullness of their respective tools (in this case code coverage), they provide results against Open Source Products. For an example check out Clover Examples.

With MySQL Workbench, to get some greater exposure of the use of graphical Data Modelling tool, I mentioned the idea of providing a respository of schemas, may help in exposure, usage and feedback of MySQL Workbench. Indeed my work is towards this being added to MySQL Forge, however I’ve found that at my first indepth work, using WordPress 2.0.2 has been a less then successful experience.

The schema would rate a 2 out of 5 on my scale of optimal database schemas. Here are some of my initial findings.

  • Inconsistent naming standards
    • ID uppercase for primary keys in some tables, but not in others.
    • different standards for primary keys, some id, some [table]_id
    • Inconsistent identification of foreign keys (e.g. post_parent). All keys should have _id.
    • Inconsistent datatypes for optional foreign keys (e.g. default of 0 rather then NULL)
    • Inconsistent column naming standards, some tables prefix columns with table name or table like name, (e.g. post_), but not all columns. Some tables has multiple prefix values (e.g. cat_, category_)
  • Poor usage of datatypes. Examples include BIGINT(20) for primary keys, TEXT for post title, INT(10) for columns which as values a TINYINT can support.
  • Inconsistent use of enums (e.g. Y/N for some values, and yes/no for others)
  • No Referential Integrity support (ie. Foreign Keys), limiting the benefits of a modelling tool showing table relationships
  • Different rules for primary keys (most tables have a suggorate key, but not all)

I am actually working on A MySQL Guide to Database Modelling, Design and Support, which covers a lot of these points and more. I’m some way from completing this document, but this initial research of the WordPress product highlights it’s importance to the open source community.

Of course we can just make noise about what we don’t like, or we can do something about it. I’m submitting my ideas, suggestions and code changes to the project. What is done about it is outside of my control, but I’ve at least given my input to this Open Source Product.

WordPress 2.0.2 Schema SQL (taken from a mysqldump after installation)

Here is a graphical representation of the Data Model from MySQL Workbench.
NOTE: My MySQL Workbench will not enable me to do a Export Image, so I’ll need to add this at a later time

Introducing Referential Integrity

Unfortunately this is no automated means of creating appropiate Referential Integrity for a schema. With more strict naming standards, it would be possible to reconstuct this information generally.

The approach to create these new rules were performed manually, but a tool could be developed to provide approximations for a number of these.

# Implement Foreign Keys. Changes include:
#  - Appropiate Storage Engine
#  - ensure column is of same definition as matching primary key (e.g. datatype + unsigned)
#  - create index for foreign key
#  - ensure optional keys use NULL

ALTER TABLE wp_categories ENGINE=InnoDB;
ALTER TABLE wp_comments ENGINE=InnoDB;
ALTER TABLE wp_linkcategories ENGINE=InnoDB;
ALTER TABLE wp_links ENGINE=InnoDB;
ALTER TABLE wp_options ENGINE=InnoDB;
ALTER TABLE wp_post2cat ENGINE=InnoDB;
ALTER TABLE wp_postmeta ENGINE=InnoDB;
ALTER TABLE wp_posts ENGINE=InnoDB;
ALTER TABLE wp_usermeta ENGINE=InnoDB;
ALTER TABLE wp_users ENGINE=InnoDB;

ALTER TABLE wp_usermeta MODIFY user_id  bigint(20) UNSIGNED NOT NULL DEFAULT '0';
ALTER TABLE wp_usermeta ADD FOREIGN KEY usermeta_users_fk (user_id) REFERENCES wp_users(ID) ON DELETE CASCADE;

ALTER TABLE wp_posts MODIFY post_author BIGINT(20) UNSIGNED NOT NULL;
CREATE INDEX post_author ON wp_posts(post_author);
ALTER TABLE wp_posts ADD FOREIGN KEY posts_users_fk (post_author) REFERENCES wp_users(ID) ON DELETE CASCADE;

ALTER TABLE wp_posts MODIFY post_parent BIGINT(20) UNSIGNED NULL DEFAULT NULL;
UPDATE wp_posts SET post_parent = NULL WHERE post_parent = 0;
CREATE INDEX post_parent ON wp_posts(post_parent);
ALTER TABLE wp_posts ADD FOREIGN KEY posts_posts_fk (post_parent) REFERENCES wp_posts(ID);

ALTER TABLE wp_postmeta MODIFY post_id  bigint(20) UNSIGNED NOT NULL DEFAULT '0';
ALTER TABLE wp_postmeta ADD FOREIGN KEY postmeta_posts_fk (post_id) REFERENCES wp_posts(ID) ON DELETE CASCADE;

ALTER TABLE wp_categories MODIFY cat_ID BIGINT(20) UNSIGNED NULL DEFAULT NULL AUTO_INCREMENT;
ALTER TABLE wp_categories MODIFY category_parent BIGINT(20) UNSIGNED NULL DEFAULT NULL;
UPDATE wp_categories SET category_parent = NULL WHERE category_parent = 0;
ALTER TABLE wp_categories ADD FOREIGN KEY categories_categories_fk (category_parent) REFERENCES wp_categories(cat_ID);

ALTER TABLE wp_post2cat MODIFY rel_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_post2cat MODIFY post_id BIGINT(20) UNSIGNED NOT NULL;
ALTER TABLE wp_post2cat MODIFY category_id BIGINT(20) UNSIGNED NOT NULL;
ALTER TABLE wp_post2cat ADD FOREIGN KEY post2cat_posts_fk (post_id) REFERENCES wp_posts(ID) ON DELETE CASCADE;
ALTER TABLE wp_post2cat ADD FOREIGN KEY post2cat_categories_fk (category_id) REFERENCES wp_categories(cat_ID) ON DELETE CASCADE;
ALTER TABLE wp_comments MODIFY user_id BIGINT(20) UNSIGNED NULL DEFAULT NULL;
UPDATE wp_comments SET user_id = NULL WHERE user_id = 0;
CREATE INDEX user_id ON wp_comments(user_id);
ALTER TABLE wp_comments ADD FOREIGN KEY comments_users_fk (user_id) REFERENCES wp_users(id) ON DELETE CASCADE;
ALTER TABLE wp_comments MODIFY comment_post_ID BIGINT(20) UNSIGNED NULL DEFAULT NULL;
ALTER TABLE wp_comments ADD FOREIGN KEY comments_posts_fk (comment_post_ID) REFERENCES wp_posts(ID) ON DELETE CASCADE;

ALTER TABLE wp_comments MODIFY comment_parent BIGINT(20) UNSIGNED NULL DEFAULT NULL;
UPDATE wp_comments SET comment_parent = NULL WHERE comment_parent = 0;
CREATE INDEX comment_parent ON wp_comments(comment_parent);
ALTER TABLE wp_comments ADD FOREIGN KEY comments_comments_fk (comment_parent) REFERENCES wp_comments(comment_id);

ALTER TABLE wp_linkcategories MODIFY cat_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_links MODIFY link_category BIGINT(20) UNSIGNED NOT NULL;
ALTER TABLE wp_links ADD FOREIGN KEY links_category_fk (link_category) REFERENCES wp_linkcategories(cat_id);

While there are no column name changes, by default the WordPress code should operate with this revised schema. Issues would include a poor delete strategy that still voilates the liberal constraints now enforced. Special handling of 0 for optional columns when the value is now NULL may also be a problem.

Revised Data Model in MySQL Workbench.
NOTE: My MySQL Workbench will not enable me to do a Export Image, so I’ll need to add this at a later time

Introducing improved Standards

Of course, I’d like to see the schema improve, here are my first draft ideas for improvements of the schema.

# Naming Improvements ===================================================================================
# Rename to lowercase id
ALTER TABLE wp_users CHANGE ID id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_posts CHANGE ID id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_categories CHANGE cat_ID cat_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_comments CHANGE comment_ID comment_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_comments CHANGE comment_post_ID comment_post_id BIGINT(20) UNSIGNED NOT NULL;

# Include _id for all Foreign Keys
ALTER TABLE wp_posts CHANGE post_author post_author_id BIGINT(20) UNSIGNED NOT NULL;
ALTER TABLE wp_posts CHANGE post_parent post_parent_id BIGINT(20) UNSIGNED NOT NULL;
ALTER TABLE wp_categories CHANGE category_parent category_parent_id BIGINT(20) UNSIGNED NOT NULL;
ALTER TABLE wp_comments CHANGE comment_parent comment_parent_id BIGINT(20) UNSIGNED NULL DEFAULT NULL;
ALTER TABLE wp_links CHANGE link_category link_category_id BIGINT(20) UNSIGNED NULL DEFAULT NULL;

# Primary Key Name Standardisation
ALTER TABLE wp_posts CHANGE id post_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_users CHANGE id user_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT;

# Foreign Key Standardisation
ALTER TABLE wp_categories MODIFY category_parent_id parent_id BIGINT(20) UNSIGNED NOT NULL;
ALTER TABLE wp_comments CHANGE comment_parent_id parent_id BIGINT(20) UNSIGNED NULL DEFAULT NULL;

# Other Column Standaisations
ALTER TABLE wp_categories MODIFY cat_name category_name VARCHAR(55) NOT NULL;

# Column width improvements
ALTER TABLE wp_categories MODIFY category_count MEDIUMINT(5) NOT NULL DEFAULT '0';
ALTER TABLE wp_posts MODIFY post_title VARCHAR(255) NOT NULL;
ALTER TABLE wp_linkcategories MODIFY cat_name VARCHAR(255) NOT NULL;
ALTER TABLE wp_linkcategories MODIFY cat_id TINYINT(3) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE wp_options MODIFY autoload ENUM('Y','N') NOT NULL DEFAULT 'Y';

# Obvious Index Improvements
# Make login a unique index
DROP INDEX user_login_key ON wp_users;
CREATE UNIQUE INDEX  user_login_key  ON wp_users(user_login);

These will require code changes and adequate testing, something I’m not hopeful would happen unless I drove it, but you never know, the next time I review the WordPress Schema I might be impressed.

Conclusion

This is just a broadstroke first review of the schema. As part of all good design, testing of the code, further review and refinement based on better understanding of the product, and also peer review are all important steps not yet undertaken.

Comments

  1. says

    This has been a frustration of mine in the past as well. Have you contacted the WordPress developers about incorporating some of your changes? I also would be willing to assist WP in doing some schema and indexing optimizations. Perhaps we should contact them together? Shoot me an email :)

    -Jay

  2. says

    This looks interesting.

    It would be better to look at the current database usage in the development version of WordPress a number of changes have been made in trunk after a similar review of the WordPress database usage by Domas Mituzas in March (See also http://trac.wordpress.org/ticket/2604).

    If you want to submit patches they are welcome over on the WordPress Trac. Most development discussion happens on the wp-hackers list (see http://codex.wordpress.org/Mailing_lists for more info on the lists.)

    Alternatively join #wordpress/#wordpress-dev on irc.freenode.org and when the developers/hackers are around they will be open to hearing your views.