PostgreSQL table audit

In our big-brother-esque age, we like to have everything under control. While in real life it may not always be a good thing, we certainly want to know what happens to our data. I've seen all sort of solutions to log database changes, many of them in PHP or some other language, but IMHO that's not how it should be done. Since the DBMS itself often offers the tools we need, let's see how to use them effectively.
Database table auditing seems to be an hot topic: there are ad-hoc tools , there's an example in the postgresql manual and of course there are blog posts about it. Nothing new under the sun.
But I'd still like to talk about it :-), since we're going to see a different approach, and learn something about PostgreSQL triggers and functions in the process, aren't we?

The goal

So how is this approach different? The idea is not to store a 1:1 copy of the monitored table with a record for each change, but to store each (and only) changed field in a record, with many details that we'll soon explore, using just one audit table (and not one for each monitored table). We're taking one step further than what David Fetter suggested in the GeneralBits (2nd April 2005) .
This is the audit table structure (you may change the size of the fields according to your needs):

CREATE TABLE audit_table (
  ts TIMESTAMP WITH TIME ZONE,
  usr VARCHAR(30),
  tbl VARCHAR(30),
  fld VARCHAR(30),
  pk_name VARCHAR(30),
  pk_value VARCHAR(40),
  mod_type CHAR(6),
  old_val TEXT,
  new_val TEXT,
);

To test our function, we create two other tables, which represent some real tables you already have and want to monitor:

CREATE TABLE test1 (
  id INTEGER NOT NULL PRIMARY KEY,
  somename VARCHAR(50),
  somevalue INTEGER,
  somets TIMESTAMP
);
CREATE TABLE test2 (
  pkid INTEGER NOT NULL PRIMARY KEY,
  testname TEXT,
  testvalue REAL,
  testdate DATE
);

NB: this example will work for tables with a Primary Key on one field only.

Prerequisites

The following function requires the PL/TCL language to be loaded in PostgreSQL. You can do it with this simple command:

$ createlang -U username pltcl dbname

We need to use PL/TCL since we can't reference NEW and OLD as arrays in PL/PgSQL. Of course, if you don't have TCL/TK 8.4 installed, it's the first thing to do. There are also some windows binaries provided by ActiveState.

N.B. for Windows users: To configure TCL, remember to set these two system environment variables:
TCL_LIBRARY = /path/to/Tcl/lib/tcl8.4
TCLLIBPATH = /path/to/Tcl/lib
otherwise you'll get this nasty and clueless error:
ERROR: could not create "normal" interpreter.

Table audit function

This is the brain of our auditing technique, in all its glory:

CREATE OR REPLACE FUNCTION "public"."log_to_audit_table" () RETURNS trigger AS
$body$
spi_exec "SELECT CURRENT_USER AS tguser"
spi_exec "SELECT relname AS tgname FROM pg_class WHERE relfilenode = $TG_relid"

#skip changes on audit_table
if {[string equal -nocase $tgname audit_table]} { return OK }

#get PK name
set pk_name ""
spi_exec "SELECT a.attname AS pk_name FROM pg_class c, pg_attribute a, pg_index i
 WHERE c.relname = '$tgname'
 AND c.oid=i.indrelid
 AND a.attnum > 0
 AND a.attrelid = i.indexrelid
 AND i.indisprimary='t'"

switch $TG_op {
INSERT {
  set pk_value ""

  #get PK value
  foreach field $TG_relatts {
    if {[string equal -nocase [lindex [array get NEW $field] 0] $pk_name]} {
      set pk_value [lindex [array get NEW $field] 1]
      break;
    }
  }
  #log inserted row values
  foreach field $TG_relatts {
    if {! [string equal -nocase [lindex [array get NEW $field] 0] $pk_name]} {
      set modified_field [lindex [array get NEW $field] 0]
      set current_value [lindex [array get NEW $field] 1]
      spi_exec -array C "INSERT INTO audit_table(ts, usr, tbl, fld, pk_name, pk_value, mod_type, old_val, new_val)
        VALUES (CURRENT_TIMESTAMP, '$tguser', '$tgname', '$modified_field', '$pk_name', '$pk_value', '$TG_op', NULL, '$current_value')"
    }
  }
}
UPDATE {
  set pk_value ""

  #get PK value
  foreach field $TG_relatts {
    if {[string equal -nocase [lindex [array get NEW $field] 0] $pk_name]} {
      set pk_value [lindex [array get NEW $field] 1]
      break;
    }
  }
  #log inserted row values
  foreach field $TG_relatts {
    #check changed fields
    if {[string equal -nocase [array get NEW $field] [array get OLD $field]] == 0} {
      set modified_field [lindex [array get OLD $field] 0]
      if {[string compare $modified_field ""] == 0} {
        set modified_field [lindex  [array get NEW $field] 0]
      }
      set previous_value [lindex [array get OLD $field] 1]
      set current_value  [lindex [array get NEW $field] 1]
      spi_exec -array C "INSERT INTO audit_table(ts, usr, tbl, fld, pk_name, pk_value, mod_type, old_val, new_val)
        VALUES (CURRENT_TIMESTAMP, '$tguser', '$tgname', '$modified_field', '$pk_name', '$pk_value', '$TG_op', '$previous_value', '$current_value')"
    }
  }
}
DELETE {
  set pk_value ""

  #get PK value
  foreach field $TG_relatts {
    if {[string equal -nocase [lindex [array get OLD $field] 0] $pk_name]} {
      set pk_value [lindex [array get OLD $field] 1]
      break;
    }
  }
  #log inserted row values
  foreach field $TG_relatts {
    if {! [string equal -nocase [lindex [array get OLD $field] 0] $pk_name]} {
      set modified_field [lindex [array get OLD $field] 0]
      set previous_value [lindex [array get OLD $field] 1]
      spi_exec -array C "INSERT INTO audit_table(ts, usr, tbl, fld, pk_name, pk_value, mod_type, old_val, new_val)
        VALUES (CURRENT_TIMESTAMP, '$tguser', '$tgname', '$modified_field', '$pk_name', '$pk_value', '$TG_op', '$previous_value', NULL)"
    }
  }
}
}
return OK
$body$
LANGUAGE 'pltcl' VOLATILE CALLED ON NULL INPUT SECURITY INVOKER;

The function is a bit long but should be easy to understand. First, we get the name of the current user and table, then we get the PK name with a query to the system catalogs and finally we log the event to the audit_table. Depending on the event type (INSERT, UPDATE or DELETE), the function loops through the table fields, checks which one(s) changed, and stores a record into the audit_table for each changed field.

Create a TRIGGER for each table to monitor

Now that we have defined a generic function that should work with any table (with the limitation of it having a PK on one field only), we can "attach" it to a table with a simple trigger:

--DROP TRIGGER tg_audit_test1 ON test1;
CREATE TRIGGER "tg_audit_test1"
AFTER INSERT OR DELETE OR UPDATE ON "test1"
FOR EACH ROW
EXECUTE PROCEDURE "log_to_audit_table" ();

--DROP TRIGGER tg_audit_test2 ON test2;
CREATE TRIGGER "tg_audit_test2"
AFTER INSERT OR DELETE OR UPDATE ON "test2"
FOR EACH ROW
EXECUTE PROCEDURE "log_to_audit_table" ();

Just change the trigger name and the table name, no other change is needed.

Test the auditing procedure

Now that we've prepared everything, let's see how it works by inserting a record into a monitored table:

INSERT INTO test1 (id, somename, somevalue, somets)
  VALUES (1, 'php', 5, CURRENT_TIMESTAMP);

Let's see what happened to the audit table:

testdb=# SELECT * FROM audit_table;
           ts               |   usr    |  tbl  |    fld    | pk_name | pk_value | mod_type | old_val |         new_val
----------------------------+----------+-------+-----------+---------+----------+----------+---------+---------------------------
 2007-02-19 23:18:51.468+01 | postgres | test1 |           | id      | 1        | INSERT   |         |
 2007-02-19 23:18:51.468+01 | postgres | test1 | somename  | id      | 1        | INSERT   |         | php
 2007-02-19 23:18:51.468+01 | postgres | test1 | somevalue | id      | 1        | INSERT   |         | 5
 2007-02-19 23:18:51.468+01 | postgres | test1 | somets    | id      | 1        | INSERT   |         | 2007-02-19 23:18:51.468+01
(4 rows)

The function saved into the audit_table the following information:

Let's insert a record in the other table:

testdb=# INSERT INTO test2 (pkid, testname, testvalue, testdate) VALUES (1, 'php', 5.21, CURRENT_DATE);
INSERT 1
testdb=# SELECT * FROM audit_table;
           ts               |   usr    |  tbl  |    fld    | pk_name | pk_value | mod_type | old_val |         new_val
----------------------------+----------+-------+-----------+---------+----------+----------+---------+---------------------------
 2007-02-19 23:26:17.437+01 | postgres | test1 |           | id      | 1        | INSERT   |         |
 2007-02-19 23:26:17.437+01 | postgres | test1 | somename  | id      | 1        | INSERT   |         | php
 2007-02-19 23:26:17.437+01 | postgres | test1 | somevalue | id      | 1        | INSERT   |         | 5
 2007-02-19 23:26:17.437+01 | postgres | test1 | somets    | id      | 1        | INSERT   |         | 2007-02-19 23:26:17.437+01
 2007-02-19 23:28:09.656+01 | postgres | test2 |           | pkid    | 1        | INSERT   |         |
 2007-02-19 23:28:09.656+01 | postgres | test2 | testname  | pkid    | 1        | INSERT   |         | php
 2007-02-19 23:28:09.656+01 | postgres | test2 | testvalue | pkid    | 1        | INSERT   |         | 5.21
 2007-02-19 23:28:09.656+01 | postgres | test2 | testdate  | pkid    | 1        | INSERT   |         | 2007-02-19
(8 rows)

The audit_table registered the insertion, with one record per field. Please note that it recognized the event happened on another table, and automatically retrieved the field names and values.
What if we update only one field?

testdb=# UPDATE test1 SET somevalue=4;
UPDATE 1
testdb=# SELECT * FROM audit_table;
           ts               |   usr    |  tbl  |    fld    | pk_name | pk_value | mod_type | old_val |         new_val
----------------------------+----------+-------+-----------+---------+----------+----------+---------+---------------------------
 2007-02-19 23:26:17.437+01 | postgres | test1 |           | id      | 1        | INSERT   |         |
 2007-02-19 23:26:17.437+01 | postgres | test1 | somename  | id      | 1        | INSERT   |         | php
 2007-02-19 23:26:17.437+01 | postgres | test1 | somevalue | id      | 1        | INSERT   |         | 5
 2007-02-19 23:26:17.437+01 | postgres | test1 | somets    | id      | 1        | INSERT   |         | 2007-02-19 23:26:17.437+01
 2007-02-19 23:28:09.656+01 | postgres | test2 |           | pkid    | 1        | INSERT   |         |
 2007-02-19 23:28:09.656+01 | postgres | test2 | testname  | pkid    | 1        | INSERT   |         | php
 2007-02-19 23:28:09.656+01 | postgres | test2 | testvalue | pkid    | 1        | INSERT   |         | 5.21
 2007-02-19 23:28:09.656+01 | postgres | test2 | testdate  | pkid    | 1        | INSERT   |         | 2007-02-19
 2007-02-19 23:30:59.140+01 | postgres | test1 | somevalue | id      | 1        | UPDATE   | 5       | 4
(9 rows)

This time we have just one new record, since only one field changed its contents. The function correctly saved both the old and the new value. As a last test, let's delete a row:

testdb=# DELETE FROM test2;
DELETE 1
testdb=# SELECT * FROM audit_table;
           ts               |   usr    |  tbl  |    fld    | pk_name | pk_value | mod_type |  old_val   |         new_val
----------------------------+----------+-------+-----------+---------+----------+----------+------------+---------------------------
 2007-02-19 23:26:17.437+01 | postgres | test1 |           | id      | 1        | INSERT   |            |
 2007-02-19 23:26:17.437+01 | postgres | test1 | somename  | id      | 1        | INSERT   |            | php
 2007-02-19 23:26:17.437+01 | postgres | test1 | somevalue | id      | 1        | INSERT   |            | 5
 2007-02-19 23:26:17.437+01 | postgres | test1 | somets    | id      | 1        | INSERT   |            | 2007-02-19 23:26:17.437+01
 2007-02-19 23:28:09.656+01 | postgres | test2 |           | pkid    | 1        | INSERT   |            |
 2007-02-19 23:28:09.656+01 | postgres | test2 | testname  | pkid    | 1        | INSERT   |            | php
 2007-02-19 23:28:09.656+01 | postgres | test2 | testvalue | pkid    | 1        | INSERT   |            | 5.21
 2007-02-19 23:28:09.656+01 | postgres | test2 | testdate  | pkid    | 1        | INSERT   |            | 2007-02-19
 2007-02-19 23:30:59.140+01 | postgres | test1 | somevalue | id      | 1        | UPDATE   | 5          | 4
 2007-02-19 23:34:04.125+01 | postgres | test2 |           | pkid    | 1        | DELETE   |            |
 2007-02-19 23:34:04.125+01 | postgres | test2 | testname  | pkid    | 1        | DELETE   | php        |
 2007-02-19 23:34:04.125+01 | postgres | test2 | testvalue | pkid    | 1        | DELETE   | 5.21       |
 2007-02-19 23:34:04.125+01 | postgres | test2 | testdate  | pkid    | 1        | DELETE   | 2007-02-19 |
(13 rows)

Again, the event was duly logged into audit_table.

Download the code

Conclusion

In this tutorial we've seen how to write a PostgreSQL function in PL/TCL and a trigger to attach the function to some events on other tables. As you can guess, the function as-is isn't perfect nor bullet-proof: you can't monitor tables with multiple-field PKs or LOBs, and the audit_table can become huge in no time. Still, you may find it useful, if nothing else for learning purposes.

Till next time, stay well.

Update

It looks like someone found this tutorial useful, and extended it to solve a few problems of the original solution. I recommend you to have a look at this blog entry too.



Socially Bookmark

Delicious     Digg     Furl     Reddit     Cosmos

Spurl     Blinklist     Simpy     Blogmarks     YahooMyWeb