Auditing your MySQL Data

I was asked recently by a client to help with providing a history of data in certain tables. Like most problems, there is no one single solution, and in this case there are several possible solutions. I was able to provide a database specific only solution, with just minimal impact to the existing schema.

Here is my approach, your feedback and alternative input as always a welcome.

The problem

Client: I want to keep a history of all changes to two tables, and have a means of viewing this history.

For the purposes of this solution, we will use one table, called ‘customer’ from the Sakila Sample Database.

Solution

For tables to be audited, we will introduce a new column called ‘audit_id’ which is NULLABLE, and hopefully will not affect any existing INSERT statements providing column naming (a Best Practice) is used.
We do this to ensure that the Audit Table has both the same structure (number and ordering of columns), and can have a Primary Key defined.

Schema Preparation

mysql> USE sakila;
mysql> ALTER TABLE customer ADD audit_id INT UNSIGNED NULL;

We can then create an Audit Schema to store the Audit Table. This helps to ensure a clean schema and support for appropriate backup and recovery. Using a standard of suffixing existing schemas with ‘_audit';

mysql> CREATE DATABASE IF NOT EXISTS sakila_audit;
mysql> USE sakila_audit;
mysql> SET FOREIGN_KEY_CHECKS=0;   # (1)
mysql> CREATE TABLE customer LIKE sakila.customer;
mysql> ALTER TABLE customer DROP KEY `email`;   # (2)
mysql> # Foreign Keys (3)
mysql> ALTER TABLE customer
           DROP PRIMARY KEY,
           MODIFY customer_id SMALLINT UNSIGNED NOT NULL,  # (4)
           MODIFY audit_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY;

NOTES:
(1) Due to Foreign Key Constraints,
(2) We need to remove all existing UNIQUE Keys, I use an I_S SQL as shown below to find these programmatically, you can also view via SHOW CREATE TABLE customer;
(3) It appears that Foreign Keys are not created with the LIKE syntax. This may change in the future, so this note to check appropriately.
(4) We need to drop the primary key, but as this involves an AUTO_INCREMENT column we need to alter this as well, and this step involves naming the primary key column as well, which will vary per table.

We now have an Audit table in a separate schema that has the same columns. Part of the process of any new Schema Releases is to ensure these tables are kept in sync. An appropriate I_S statement could be used for verification. In this case, the support for a TRIGGER on Instance Startup to run, and throw an error to the error log or system error log (possible via UDF) would enable a balance check here.

Access to Audit Information

We provide a VIEW to the Audit table for history purposes. In addition, we use this for simplification of trigger management.

mysql> CREATE VIEW sakila.customer_history AS SELECT * FROM sakila_audit.customer;

NOTE: No index optimization has been performed on the Audit Table. It would be anticipated that existing Indexes could indeed be dropped and replaced with new indexes appropriate for data access.

Trigger Creation

In order to keep a copy of all data, we introduce two database triggers to manage a copy of all data in the history table. It is possible to say that History consists of the Current Version (in the base table) and all previous versions in the Audit Table. This requires 3 triggers. An alternative is to keep a full copy of all versions including current in the Audit Table. This requires 2 triggers, and takes more diskspace, however is a simpler and cleaner implementation.

USE sakila;
DELIMITER $$
DROP TRIGGER IF EXISTS customer_ari$$
CREATE TRIGGER customer_ari
AFTER INSERT ON customer
FOR EACH ROW
BEGIN
  INSERT INTO customer_history
  SELECT * FROM customer
  WHERE  customer_id = NEW.customer_id;
END;
$$
DROP TRIGGER IF EXISTS customer_aru$$
CREATE TRIGGER customer_aru
AFTER UPDATE ON customer
FOR EACH ROW
BEGIN
  INSERT INTO customer_history
  SELECT * FROM customer
  WHERE  customer_id = NEW.customer_id;
END;
$$
DELIMITER ;

NOTE: I do not generally like to use ‘SELECT *’ however in this situation, the trigger is significantly simplified. This is of benefit if you are maintaining audit triggers on many tables. The disadvantage is you must ensure your schema tables (e.g. sakila and sakila_audit) are always kept in sync with the same number and order of columns. Failing to add a column to the audit database will result in an error, which is a good confirmation. Failing to add a column in the right order, may corrupt your data. Exercise caution when modifying the schema in this situation.

Testing

As with any proper coding, we need to test this. The following sample SQL was run to test on a sample database.

SET FOREIGN_KEY_CHECKS=0;
USE sakila;
TRUNCATE TABLE sakila.customer;
TRUNCATE TABLE sakila_audit.customer;
SELECT 'no customer data', IF (count(*)=0,'OK','ERROR'),COUNT(*) AS VALUE from customer;
SELECT 'no customer history data', IF (count(*)=0,'OK','ERROR'),COUNT(*) AS VALUE from customer_history;
INSERT INTO customer (customer_id,store_id,first_name,last_name,email,address_id,active,create_date)
              VALUES(NULL,1,'mickey','mouse',',[email protected]',1,TRUE,NOW());
SELECT 'customer data = 1 row', IF (count(*)=1,'OK','ERROR'),COUNT(*) AS VALUE from customer;
SELECT 'customer history data = 1 row', IF (count(*)=1,'OK','ERROR'),COUNT(*) AS VALUE from customer_history;
INSERT INTO customer(customer_id,store_id,first_name,last_name,email,address_id,active,create_date)
              VALUES(NULL,1,'donald','duck',',[email protected]',1,TRUE,NOW());
SELECT 'customer data = 2 rows', IF (count(*)=2,'OK','ERROR'),COUNT(*) AS VALUE from customer;
SELECT 'customer history data = 2 rows', IF (count(*)=2,'OK','ERROR'),COUNT(*) AS VALUE from customer_history;
UPDATE customer SET email='[email protected]' where email='[email protected]';
SELECT 'customer data = 2 rows', IF (count(*)=2,'OK','ERROR'),COUNT(*) AS VALUE from customer;
SELECT 'customer history data = 3 rows', IF (count(*)=3,'OK','ERROR'),COUNT(*) AS VALUE from customer_history;
DELETE FROM customer where email='[email protected]';
SELECT 'customer data = 1 rows', IF (count(*)=1,'OK','ERROR'),COUNT(*) AS VALUE from customer;
SELECT 'customer history data = 3 rows', IF (count(*)=3,'OK','ERROR'),COUNT(*) AS VALUE from customer_history;
DELETE FROM customer;
SELECT 'customer data = 0 rows', IF (count(*)=0,'OK','ERROR'),COUNT(*) AS VALUE from customer;
SELECT 'customer history data = 3 rows', IF (count(*)=3,'OK','ERROR'),COUNT(*) AS VALUE from customer_history;

Improvements

It would be great if MySQL’s Procedural Language was a little more flexible and robust. Some improvements I’d love to see that would enable a more programmatic solution as the above contains a number of dependencies in schema_name and column_name.

  • Raise Error Handling to throw errors appropriately
  • Anonymous code block support, e.g. BEGIN …code… END; and an automatic execution, not the need to create a Procedure then execute it.
  • Ability to execute dynamic SQL more easily, for example CREATE DATABASE IF NOT EXISTS @variable; or CREATE VIEW @schema.@table_name_history FROM …
  • Support for multiple type (BEFORE|AFTER INSERT|UPDATE|DELETE) triggers per table.

INFORMATION_SCHEMA Query

mysql> SET @schema='sakila';
mysql> SET @table='customer';
mysql> SELECT DISTINCT CONCAT('ALTER TABLE ', table_name, ' DROP KEY `',constraint_name,'`;') AS cmd
           FROM INFORMATION_SCHEMA.table_constraints
           WHERE constraint_schema=@schema
           AND table_name=@table
           AND constraint_type='UNIQUE';

Comments

  1. says

    Hi Ronald,

    Nice post! I have a few comments though…

    “In this case, the support for a TRIGGER on Instance Startup to run”

    Well, you can actually do this, kind of:

    http://dev.mysql.com/doc/refman/5.1/en/server-options.html#option_mysqld_init-file

    “I want to keep a history of all changes to two tables, and have a means of viewing this history.”
    and
    “An alternative is to keep a full copy of all versions including current in the Audit Table.”

    I don’t see you keeping track of the DELETEs. You can of course query it and deduce that a row must have been deleted, but it seems more sensible to ensure the history database is a full history, including DELETE events (so you can reconstruct the current situation from a history database alone). Of course that would involve another TRIGGER.

    Another thing I was wondering about is why you don’t add at least a datetime or timestamp, and possibly other data such as current user, user host etc. to the audit tables. I mean, if you see something strange, You’d like to know when it happened, and who did it, no?

    “I do not generally like to use ‘SELECT *’ however in this situation, the trigger is significantly simplified. This is of benefit if you are maintaining audit triggers on many tables.”

    mm, personally I don’t like this trigger code that much…For each updated or inserted row you select the current row from the table…why? All data for the current row is available in the NEW.% columns, so there is no need to query anything…If is of course tedious to type all that, but you can easily generate the TRIGGER code from the I_S. Once you write that script, you can easily extend the history to include more tables as a bonus…

    see: http://www.quest-pipelines.com/newsletter-v6/1005_A.htm

    (spirit of the code is there, not sure if everything works)

    kind regards,

    Roland

  2. L. says

    Hello,

    The audit_id column is not necessary, you only need it in the history table as a pkey, so why add a column that does not contain data to every table you audit ?

    SELECT * in this case is just fine, but as you will need to update your triggers (programatically) every time you change structure, and as such a “create audit table /. database” script is meant to be scripted rather than written, I don’t think there’s any difference between select field1,field2,… and select *

    About the dynamic SQL, looks like it’s possible today (didn’t read in much detail but — http://www.hbensalem.com/mysql/mysql-database-table-history-using-schemas-and-triggers/#comment-2138
    )

    Same comment as for the others, you should consider implementing triggers to update your triggers on changes in the mysql metadata tables, something that you can in fact push to the limit in order to have an audit table and all the required triggers created on every create table, alter table command ;)

  3. L. says

    Just read the comment above .. not including timestamp (/user where applicable) is of course a crime, select * I insist is perfectly fine, using the in-memory data is of course better, and the link doesn’t work anyway.