I was asked recently by a client to help with providing a history of data in certain tables. Like most problems, there is no one single solution, and in this case there are several possible solutions. I was able to provide a database specific only solution, with just minimal impact to the existing schema.
Here is my approach, your feedback and alternative input as always a welcome.
The problem
Client: I want to keep a history of all changes to two tables, and have a means of viewing this history.
For the purposes of this solution, we will use one table, called ‘customer’ from the Sakila Sample Database .
Solution
For tables to be audited, we will introduce a new column called ‘audit_id’ which is NULLABLE, and hopefully will not affect any existing INSERT statements providing column naming (a Best Practice) is used.
We do this to ensure that the Audit Table has both the same structure (number and ordering of columns), and can have a Primary Key defined.
Schema Preparation
mysql> USE sakila; mysql> ALTER TABLE customer ADD audit_id INT UNSIGNED NULL;
We can then create an Audit Schema to store the Audit Table. This helps to ensure a clean schema and support for appropriate backup and recovery. Using a standard of suffixing existing schemas with ‘_audit';
mysql> CREATE DATABASE IF NOT EXISTS sakila_audit; mysql> USE sakila_audit; mysql> SET FOREIGN_KEY_CHECKS=0; # (1) mysql> CREATE TABLE customer LIKE sakila.customer; mysql> ALTER TABLE customer DROP KEY `email`; # (2) mysql> # Foreign Keys (3) mysql> ALTER TABLE customer DROP PRIMARY KEY, MODIFY customer_id SMALLINT UNSIGNED NOT NULL, # (4) MODIFY audit_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY;
NOTES:
(1) Due to Foreign Key Constraints,
(2) We need to remove all existing UNIQUE Keys, I use an I_S SQL as shown below to find these programmatically, you can also view via SHOW CREATE TABLE customer;
(3) It appears that Foreign Keys are not created with the LIKE syntax. This may change in the future, so this note to check appropriately.
(4) We need to drop the primary key, but as this involves an AUTO_INCREMENT column we need to alter this as well, and this step involves naming the primary key column as well, which will vary per table.
We now have an Audit table in a separate schema that has the same columns. Part of the process of any new Schema Releases is to ensure these tables are kept in sync. An appropriate I_S statement could be used for verification. In this case, the support for a TRIGGER on Instance Startup to run, and throw an error to the error log or system error log (possible via UDF) would enable a balance check here.
Access to Audit Information
We provide a VIEW to the Audit table for history purposes. In addition, we use this for simplification of trigger management.
mysql> CREATE VIEW sakila.customer_history AS SELECT * FROM sakila_audit.customer;
NOTE: No index optimization has been performed on the Audit Table. It would be anticipated that existing Indexes could indeed be dropped and replaced with new indexes appropriate for data access.
Trigger Creation
In order to keep a copy of all data, we introduce two database triggers to manage a copy of all data in the history table. It is possible to say that History consists of the Current Version (in the base table) and all previous versions in the Audit Table. This requires 3 triggers. An alternative is to keep a full copy of all versions including current in the Audit Table. This requires 2 triggers, and takes more diskspace, however is a simpler and cleaner implementation.
USE sakila; DELIMITER $$ DROP TRIGGER IF EXISTS customer_ari$$ CREATE TRIGGER customer_ari AFTER INSERT ON customer FOR EACH ROW BEGIN INSERT INTO customer_history SELECT * FROM customer WHERE customer_id = NEW.customer_id; END; $$ DROP TRIGGER IF EXISTS customer_aru$$ CREATE TRIGGER customer_aru AFTER UPDATE ON customer FOR EACH ROW BEGIN INSERT INTO customer_history SELECT * FROM customer WHERE customer_id = NEW.customer_id; END; $$ DELIMITER ;
NOTE: I do not generally like to use ‘SELECT *’ however in this situation, the trigger is significantly simplified. This is of benefit if you are maintaining audit triggers on many tables. The disadvantage is you must ensure your schema tables (e.g. sakila and sakila_audit) are always kept in sync with the same number and order of columns. Failing to add a column to the audit database will result in an error, which is a good confirmation. Failing to add a column in the right order, may corrupt your data. Exercise caution when modifying the schema in this situation.
Testing
As with any proper coding, we need to test this. The following sample SQL was run to test on a sample database.
SET FOREIGN_KEY_CHECKS=0; USE sakila; TRUNCATE TABLE sakila.customer; TRUNCATE TABLE sakila_audit.customer; SELECT 'no customer data', IF (count(*)=0,'OK','ERROR'),COUNT(*) AS VALUE from customer; SELECT 'no customer history data', IF (count(*)=0,'OK','ERROR'),COUNT(*) AS VALUE from customer_history; INSERT INTO customer (customer_id,store_id,first_name,last_name,email,address_id,active,create_date) VALUES(NULL,1,'mickey','mouse',',mickey@disney.com',1,TRUE,NOW()); SELECT 'customer data = 1 row', IF (count(*)=1,'OK','ERROR'),COUNT(*) AS VALUE from customer; SELECT 'customer history data = 1 row', IF (count(*)=1,'OK','ERROR'),COUNT(*) AS VALUE from customer_history; INSERT INTO customer(customer_id,store_id,first_name,last_name,email,address_id,active,create_date) VALUES(NULL,1,'donald','duck',',d.duck@warnerbros.com',1,TRUE,NOW()); SELECT 'customer data = 2 rows', IF (count(*)=2,'OK','ERROR'),COUNT(*) AS VALUE from customer; SELECT 'customer history data = 2 rows', IF (count(*)=2,'OK','ERROR'),COUNT(*) AS VALUE from customer_history; UPDATE customer SET email='donaldduck@warnerbros.com' where email='dduck@warnerbros.com'; SELECT 'customer data = 2 rows', IF (count(*)=2,'OK','ERROR'),COUNT(*) AS VALUE from customer; SELECT 'customer history data = 3 rows', IF (count(*)=3,'OK','ERROR'),COUNT(*) AS VALUE from customer_history; DELETE FROM customer where email='donaldduck@warnerbros.com'; SELECT 'customer data = 1 rows', IF (count(*)=1,'OK','ERROR'),COUNT(*) AS VALUE from customer; SELECT 'customer history data = 3 rows', IF (count(*)=3,'OK','ERROR'),COUNT(*) AS VALUE from customer_history; DELETE FROM customer; SELECT 'customer data = 0 rows', IF (count(*)=0,'OK','ERROR'),COUNT(*) AS VALUE from customer; SELECT 'customer history data = 3 rows', IF (count(*)=3,'OK','ERROR'),COUNT(*) AS VALUE from customer_history;
Improvements
It would be great if MySQL’s Procedural Language was a little more flexible and robust. Some improvements I’d love to see that would enable a more programmatic solution as the above contains a number of dependencies in schema_name and column_name.
- Raise Error Handling to throw errors appropriately
- Anonymous code block support, e.g. BEGIN …code… END; and an automatic execution, not the need to create a Procedure then execute it.
- Ability to execute dynamic SQL more easily, for example CREATE DATABASE IF NOT EXISTS @variable; or CREATE VIEW @schema.@table_name_history FROM …
- Support for multiple type (BEFORE|AFTER INSERT|UPDATE|DELETE) triggers per table.
INFORMATION_SCHEMA Query
mysql> SET @schema='sakila'; mysql> SET @table='customer'; mysql> SELECT DISTINCT CONCAT('ALTER TABLE ', table_name, ' DROP KEY `',constraint_name,'`;') AS cmd FROM INFORMATION_SCHEMA.table_constraints WHERE constraint_schema=@schema AND table_name=@table AND constraint_type='UNIQUE';