Opinions, Expertise, Passion.

Information in black and white, and sometimes some color.

Jul
15

Auditing your MySQL Data - Part 2

Link to this post

Continuing from my earlier post Auditing your MySQL Data, Roland has accurately highlighted that my initial post leaves out some important information for auditing. As the original charter was only to keep a history, for the purpose of comparing certain columns, a history was all that was needed.

Providing a history of changes forms the basis of auditing, and in keeping with my post title and intended follow-up, this is the all important second part. However in order to provide true auditing additional information is necessary. This includes:

  1. When was an operation performed
  2. What operation was performed, i.e. INSERT, UPDATE and DELETE
  3. Who performed the operation

Date and operation can be determined via the database, but in order to gather all this information, interaction with the application is necessary to obtain the true user information (This can’t be determined via a trigger)

The issue becomes a greater need for design understanding. What is the purpose of the audit data? How will it be accessed? How complex in maintaining the data do you wish to consider?

One alternative is keep a separate log of audit history. The benefits are a clear and easy way to provide a history of a users’ actions, and can preserve the structure of database table between the base and audit table can remain the same, triggers can remain relatively simply. However if you want to look at the data with audit history, it is better to embed these columns within each table, and triggers have to be customized and maintained in more detail the my original post.

When considering the progression of these points, the design process normally returns to the following conclusion. The following columns are added to the base table.

  • A create_timestamp column is added
  • A last_update_timestamp column is added
  • A last_update_user_id column is added

The create_timestamp is optional from an auditing perspective, because the last_update_timestamp of the first audit row will contain the same value, however experience has shown this column is valuable for other design considerations.

The only remaining issue is the type of operation, INSERT,UPDATE & DELETE. Both INSERT and UPDATE can be inferred, DELETE can not. To maintain the simplicity model, a common approach is to use a BEFORE DELETE trigger to insert an audit record with all the same values of the previous row, with the last_update_timestamp manually set. DELETE can then be determined via a no difference in any updated values.

It ultimate conclusion comes down your application design and needs. For example, your design for example may include a flag or row status for example to indicate deletes which are later cleaned up via a batch process so you don’t really care about the date/time of the actual purging of data. This then negates the need for any DELETE trigger.

Again, thanks to Roland for providing a link to Putting the MySQL information_schema to Use which provides a number of SQL statements that help in the generation of Triggers to support full auditing.

You should be aware that CURRENT_USER normally serves zero purpose if all changes are made via an application user.

At this time, you also have another design consideration. Do you introduce a procedure to re-create the triggers via an automated means for each schema change, or do you manually maintain triggers with schema changes. With each approach, additional checking and verification is necessary to ensure your triggers are correctly configured.

Posted under Databases, MySQL, Professional on 15 Jul 2008
Comments (2)
Jul
15

Auditing your MySQL Data

Link to this post

I was asked recently by a client to help with providing a history of data in certain tables. Like most problems, there is no one single solution, and in this case there are several possible solutions. I was able to provide a database specific only solution, with just minimal impact to the existing schema.

Here is my approach, your feedback and alternative input as always a welcome.

The problem

Client: I want to keep a history of all changes to two tables, and have a means of viewing this history.

For the purposes of this solution, we will use one table, called ‘customer’ from the Sakila Sample Database.

Solution

For tables to be audited, we will introduce a new column called ‘audit_id’ which is NULLABLE, and hopefully will not affect any existing INSERT statements providing column naming (a Best Practice) is used.
We do this to ensure that the Audit Table has both the same structure (number and ordering of columns), and can have a Primary Key defined.

Schema Preparation

mysql> USE sakila;
mysql> ALTER TABLE customer ADD audit_id INT UNSIGNED NULL;

We can then create an Audit Schema to store the Audit Table. This helps to ensure a clean schema and support for appropriate backup and recovery. Using a standard of suffixing existing schemas with ‘_audit’;

mysql> CREATE DATABASE IF NOT EXISTS sakila_audit;
mysql> USE sakila_audit;
mysql> SET FOREIGN_KEY_CHECKS=0;   # (1)
mysql> CREATE TABLE customer LIKE sakila.customer;
mysql> ALTER TABLE customer DROP KEY `email`;   # (2)
mysql> # Foreign Keys (3)
mysql> ALTER TABLE customer
           DROP PRIMARY KEY,
           MODIFY customer_id SMALLINT UNSIGNED NOT NULL,  # (4)
           MODIFY audit_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY;

NOTES:
(1) Due to Foreign Key Constraints,
(2) We need to remove all existing UNIQUE Keys, I use an I_S SQL as shown below to find these programmatically, you can also view via SHOW CREATE TABLE customer;
(3) It appears that Foreign Keys are not created with the LIKE syntax. This may change in the future, so this note to check appropriately.
(4) We need to drop the primary key, but as this involves an AUTO_INCREMENT column we need to alter this as well, and this step involves naming the primary key column as well, which will vary per table.

We now have an Audit table in a separate schema that has the same columns. Part of the process of any new Schema Releases is to ensure these tables are kept in sync. An appropriate I_S statement could be used for verification. In this case, the support for a TRIGGER on Instance Startup to run, and throw an error to the error log or system error log (possible via UDF) would enable a balance check here.

Access to Audit Information

We provide a VIEW to the Audit table for history purposes. In addition, we use this for simplification of trigger management.

mysql> CREATE VIEW sakila.customer_history AS SELECT * FROM sakila_audit.customer;

NOTE: No index optimization has been performed on the Audit Table. It would be anticipated that existing Indexes could indeed be dropped and replaced with new indexes appropriate for data access.

Trigger Creation

In order to keep a copy of all data, we introduce two database triggers to manage a copy of all data in the history table. It is possible to say that History consists of the Current Version (in the base table) and all previous versions in the Audit Table. This requires 3 triggers. An alternative is to keep a full copy of all versions including current in the Audit Table. This requires 2 triggers, and takes more diskspace, however is a simpler and cleaner implementation.

USE sakila;
DELIMITER $$
DROP TRIGGER IF EXISTS customer_ari$$
CREATE TRIGGER customer_ari
AFTER INSERT ON customer
FOR EACH ROW
BEGIN
  INSERT INTO customer_history
  SELECT * FROM customer
  WHERE  customer_id = NEW.customer_id;
END;
$$
DROP TRIGGER IF EXISTS customer_aru$$
CREATE TRIGGER customer_aru
AFTER UPDATE ON customer
FOR EACH ROW
BEGIN
  INSERT INTO customer_history
  SELECT * FROM customer
  WHERE  customer_id = NEW.customer_id;
END;
$$
DELIMITER ;

NOTE: I do not generally like to use ‘SELECT *’ however in this situation, the trigger is significantly simplified. This is of benefit if you are maintaining audit triggers on many tables. The disadvantage is you must ensure your schema tables (e.g. sakila and sakila_audit) are always kept in sync with the same number and order of columns. Failing to add a column to the audit database will result in an error, which is a good confirmation. Failing to add a column in the right order, may corrupt your data. Exercise caution when modifying the schema in this situation.

Testing

As with any proper coding, we need to test this. The following sample SQL was run to test on a sample database.

SET FOREIGN_KEY_CHECKS=0;
USE sakila;
TRUNCATE TABLE sakila.customer;
TRUNCATE TABLE sakila_audit.customer;
SELECT 'no customer data', IF (count(*)=0,'OK','ERROR'),COUNT(*) AS VALUE from customer;
SELECT 'no customer history data', IF (count(*)=0,'OK','ERROR'),COUNT(*) AS VALUE from customer_history;
INSERT INTO customer (customer_id,store_id,first_name,last_name,email,address_id,active,create_date)
              VALUES(NULL,1,'mickey','mouse',',mickey@disney.com',1,TRUE,NOW());
SELECT 'customer data = 1 row', IF (count(*)=1,'OK','ERROR'),COUNT(*) AS VALUE from customer;
SELECT 'customer history data = 1 row', IF (count(*)=1,'OK','ERROR'),COUNT(*) AS VALUE from customer_history;
INSERT INTO customer(customer_id,store_id,first_name,last_name,email,address_id,active,create_date)
              VALUES(NULL,1,'donald','duck',',d.duck@warnerbros.com',1,TRUE,NOW());
SELECT 'customer data = 2 rows', IF (count(*)=2,'OK','ERROR'),COUNT(*) AS VALUE from customer;
SELECT 'customer history data = 2 rows', IF (count(*)=2,'OK','ERROR'),COUNT(*) AS VALUE from customer_history;
UPDATE customer SET email='donaldduck@warnerbros.com' where email='dduck@warnerbros.com';
SELECT 'customer data = 2 rows', IF (count(*)=2,'OK','ERROR'),COUNT(*) AS VALUE from customer;
SELECT 'customer history data = 3 rows', IF (count(*)=3,'OK','ERROR'),COUNT(*) AS VALUE from customer_history;
DELETE FROM customer where email='donaldduck@warnerbros.com';
SELECT 'customer data = 1 rows', IF (count(*)=1,'OK','ERROR'),COUNT(*) AS VALUE from customer;
SELECT 'customer history data = 3 rows', IF (count(*)=3,'OK','ERROR'),COUNT(*) AS VALUE from customer_history;
DELETE FROM customer;
SELECT 'customer data = 0 rows', IF (count(*)=0,'OK','ERROR'),COUNT(*) AS VALUE from customer;
SELECT 'customer history data = 3 rows', IF (count(*)=3,'OK','ERROR'),COUNT(*) AS VALUE from customer_history;

Improvements

It would be great if MySQL’s Procedural Language was a little more flexible and robust. Some improvements I’d love to see that would enable a more programmatic solution as the above contains a number of dependencies in schema_name and column_name.

  • Raise Error Handling to throw errors appropriately
  • Anonymous code block support, e.g. BEGIN …code… END; and an automatic execution, not the need to create a Procedure then execute it.
  • Ability to execute dynamic SQL more easily, for example CREATE DATABASE IF NOT EXISTS @variable; or CREATE VIEW @schema.@table_name_history FROM …
  • Support for multiple type (BEFORE|AFTER INSERT|UPDATE|DELETE) triggers per table.

INFORMATION_SCHEMA Query

mysql> SET @schema='sakila';
mysql> SET @table='customer';
mysql> SELECT DISTINCT CONCAT('ALTER TABLE ', table_name, ' DROP KEY `',constraint_name,'`;') AS cmd
           FROM INFORMATION_SCHEMA.table_constraints
           WHERE constraint_schema=@schema
           AND table_name=@table
           AND constraint_type='UNIQUE';
Posted under Databases, MySQL, Professional on 15 Jul 2008
Comments (1)
Jul
15

Focus on what you do best

Link to this post

When you have a great idea for a web application, it can be hard to consider with all the moving parts to focus just on what’s your uniqueness or differentiator from everybody else.

You may want to have control over your forums, comments, chat, photo management etc, i.e. user data, but how much does that help you. Is allocating resources to these features when plenty of completed applications exist distracting you and lengthening your time to market.

I always like to refer to Guy Kawasaki’s quote “Don’t worry, be crappy”. While I don’t necessarily agree with just throwing functionality out to the www, I believe in quality over quantity, you want to ensure that more time is spent in reviewing the input for new or improved features rather then bugs, bugs, bugs.


Ping.fm and Plurk are two new community driven sites that have leveraged the functionality of other sites, these being Get Satisfaction - People-Powered Customer Service for Everything! and Disq Us - Turn your blog comments into a webwide discussion.

There are advantages and disadvantages to this approach. As an smaller web site with a growing community, exposing what you do to a wider audience when using a third party to manage something can greatly help in exposure and associated marketing at no cost. On the down side, you are losing traffic to another site.

You need to ensure you can always get access to your data, and your community contributions. Ensuring adequate API’s for integration and data extraction are key. From a technology perspective, BitKeeper and LaunchPad come to mind. BitKeeper is a closed source, version control system that MySQL used. This was a killer for community contributions, where individual users simply could not contribute, and if they wanted even access to getting source code via the repository had to pay for an appropriate client. SourceForge and Apache are two examples of huge communities where they leverage the power of the community. LaunchPad is the latest kid on the block, but suffers from the fact that while access to applications hosted there are free, the actually LaunchPad code itself is closed. This has caused some issues.

It’s a fine line, and in the genre of software development, the Internet can create copies of anything just about overnight. More and more I hear about companies working in stealth mode rather then open community input and interaction, but that’s a topic for another discussion.

Posted under Professional on 15 Jul 2008
Comments (0)
Home
Professional Blog RSS Feed of Professional Blog
Consulting
Presentations
About Ronald
Related Links
Contact Ronald
  • « Jun spinner iCalendar Aug »
    July 2008
    M T W T F S S
     123456
    78910111213
    14151617181920
    21222324252627
    28293031EC
  • Categories:
    • Professional
      • 42SQL
      • Apple
        • iPhone
        • MacBook
        • OS/X
      • Clever Design
      • Cloud Computing
        • 10gen
        • AppNexus
        • Kaavo
        • Kloudshare
      • Databases
        • Drizzle
        • Ingres
        • MySQL
          • Compiling
          • GUI Products
          • MySQL Events
            • mysqlcamp01
            • mysqlcamp02
          • MySQL Proxy
          • MySQL User Conferences
            • mysqluc06
            • mysqluc07
            • mysqluc08
          • Storage Engines
            • Non Transactional
              • Infobright
              • KickFire
              • Maria
              • Nitro
            • Transactional
              • Blob Streaming
              • Falcon
              • InnoDB
              • PBXT
              • Solid
        • Oracle
      • Extreme Programming (XP)
      • General
      • Java
        • Tomcat
      • Linux
        • One Liners
      • Microsoft
      • Open Source
        • Buildbot
        • Ubuntu
        • UltimateLAMP
        • Virtual Box
      • OSCON 2008
      • Packet General
      • PrimeBase Technologies
      • Solid State Drives
      • Sun
      • The Daily WTF
      • Web 2.0 NY
      • Windoze
      • Yahoo
    • Web
      • Google
        • App Engine
        • Summer of Code
      • SEO
        • Brand Identity
      • Web Development
        • Amazon
          • EC2
          • S3
          • SimpleDB
        • CSS
        • HTML
        • PHP
        • Web 2.0
      • Web Sites
        • Application Software
        • Content
        • Cool Tools
        • Linux Stuff
        • MySQL Related
        • Show Your Stuff
        • Twitter
        • Unype
      • WordPress
  • Pages:
    • Best Of PlanetMySQL Articles
    • Interesting Articles
    • MediaWiki Restyling (1)

  • Archives:
    • November 2008
    • October 2008
    • September 2008
    • August 2008
    • July 2008
    • June 2008
    • May 2008
    • April 2008
    • March 2008
    • February 2008
    • January 2008
    • December 2007
    • November 2007
    • October 2007
    • September 2007
    • August 2007
    • July 2007
    • June 2007
    • May 2007
    • April 2007
    • March 2007
    • February 2007
    • January 2007
    • December 2006
    • November 2006
    • October 2006
    • September 2006
    • August 2006
    • July 2006
    • June 2006
    • May 2006
    • April 2006
    • March 2006
    • February 2006
    • January 2006
    • December 2005
    • November 2005
    • October 2005
    • September 2005
    • July 2005
    • June 2005
    • February 2005
    • October 2004
    • September 2004
    • July 2004
    • June 2004