PostgreSQL table audit

In our big-brother-esque age, we like to have everything under control. While in real life it may not always be a good thing, we certainly want to know what happens to our data. I've seen all sort of solutions to log database changes, many of them in PHP or some other language, but IMHO that's not how it should be done. Since the DBMS itself often offers the tools we need, let's see how to use them effectively.
Database table auditing seems to be an hot topic: there are ad-hoc tools , there's an example in the postgresql manual and of course there are blog posts about it. Nothing new under the sun.
But I'd still like to talk about it :-), since we're going to see a different approach, and learn something about PostgreSQL triggers and functions in the process, aren't we?

The goal

So how is this approach different? The idea is not to store a 1:1 copy of the monitored table with a record for each change, but to store each (and only) changed field in a record, with many details that we'll soon explore, using just one audit table (and not one for each monitored table). We're taking one step further than what David Fetter suggested in the GeneralBits (2nd April 2005) .
This is the audit table structure (you may change the size of the fields according to your needs):

CREATE TABLE audit_table (
  ts TIMESTAMP WITH TIME ZONE,
  usr VARCHAR(30),
  tbl VARCHAR(30),
  fld VARCHAR(30),
  pk_name VARCHAR(30),
  pk_value VARCHAR(40),
  mod_type CHAR(6),
  old_val TEXT,
  new_val TEXT,
);

To test our function, we create two other tables, which represent some real tables you already have and want to monitor:

CREATE TABLE test1 (
  id INTEGER NOT NULL PRIMARY KEY,
  somename VARCHAR(50),
  somevalue INTEGER,
  somets TIMESTAMP
);
CREATE TABLE test2 (
  pkid INTEGER NOT NULL PRIMARY KEY,
  testname TEXT,
  testvalue REAL,
  testdate DATE
);

NB: this example will work for tables with a Primary Key on one field only.

Prerequisites

The following function requires the PL/TCL language to be loaded in PostgreSQL. You can do it with this simple command:

$ createlang -U username pltcl dbname

We need to use PL/TCL since we can't reference NEW and OLD as arrays in PL/PgSQL. Of course, if you don't have TCL/TK 8.4 installed, it's the first thing to do. There are also some windows binaries provided by ActiveState.


N.B. for Windows users: To configure TCL, remember to set these two system environment variables:
TCL_LIBRARY = /path/to/Tcl/lib/tcl8.4
TCLLIBPATH = /path/to/Tcl/lib
otherwise you'll get this nasty and clueless error:
ERROR: could not create "normal" interpreter.

Table audit function

This is the brain of our auditing technique, in all its glory:

CREATE OR REPLACE FUNCTION "public"."log_to_audit_table" () RETURNS trigger AS
$body$
spi_exec "SELECT CURRENT_USER AS tguser"
spi_exec "SELECT relname AS tgname FROM pg_class WHERE relfilenode = $TG_relid"
 
#skip changes on audit_table
if {[string equal -nocase $tgname audit_table]} { return OK }
 
#get PK name
set pk_name ""
spi_exec "SELECT a.attname AS pk_name FROM pg_class c, pg_attribute a, pg_index i
 WHERE c.relname = '$tgname'
 AND c.oid=i.indrelid
 AND a.attnum > 0
 AND a.attrelid = i.indexrelid
 AND i.indisprimary='t'"
 
switch $TG_op {
INSERT {
  set pk_value ""
 
  #get PK value
  foreach field $TG_relatts {
    if {[string equal -nocase [lindex [array get NEW $field] 0] $pk_name]} {
      set pk_value [lindex [array get NEW $field] 1]
      break;
    }
  }
  #log inserted row values
  foreach field $TG_relatts {
    if {! [string equal -nocase [lindex [array get NEW $field] 0] $pk_name]} {
      set modified_field [lindex [array get NEW $field] 0]
      set current_value [lindex [array get NEW $field] 1]
      spi_exec -array C "INSERT INTO audit_table(ts, usr, tbl, fld, pk_name, pk_value, mod_type, old_val, new_val)
        VALUES (CURRENT_TIMESTAMP, '$tguser', '$tgname', '$modified_field', '$pk_name', '$pk_value', '$TG_op', NULL, '$current_value')"
    }
  }
}
UPDATE {
  set pk_value ""
 
  #get PK value
  foreach field $TG_relatts {
    if {[string equal -nocase [lindex [array get NEW $field] 0] $pk_name]} {
      set pk_value [lindex [array get NEW $field] 1]
      break;
    }
  }
  #log inserted row values
  foreach field $TG_relatts {
    #check changed fields
    if {[string equal -nocase [array get NEW $field] [array get OLD $field]] == 0} {
      set modified_field [lindex [array get OLD $field] 0]
      if {[string compare $modified_field ""] == 0} {
        set modified_field [lindex  [array get NEW $field] 0]
      }
      set previous_value [lindex [array get OLD $field] 1]
      set current_value  [lindex [array get NEW $field] 1]
      spi_exec -array C "INSERT INTO audit_table(ts, usr, tbl, fld, pk_name, pk_value, mod_type, old_val, new_val)
        VALUES (CURRENT_TIMESTAMP, '$tguser', '$tgname', '$modified_field', '$pk_name', '$pk_value', '$TG_op', '$previous_value', '$current_value')"
    }
  }
}
DELETE {
  set pk_value ""
 
  #get PK value
  foreach field $TG_relatts {
    if {[string equal -nocase [lindex [array get OLD $field] 0] $pk_name]} {
      set pk_value [lindex [array get OLD $field] 1]
      break;
    }
  }
  #log inserted row values
  foreach field $TG_relatts {
    if {! [string equal -nocase [lindex [array get OLD $field] 0] $pk_name]} {
      set modified_field [lindex [array get OLD $field] 0]
      set previous_value [lindex [array get OLD $field] 1]
      spi_exec -array C "INSERT INTO audit_table(ts, usr, tbl, fld, pk_name, pk_value, mod_type, old_val, new_val)
        VALUES (CURRENT_TIMESTAMP, '$tguser', '$tgname', '$modified_field', '$pk_name', '$pk_value', '$TG_op', '$previous_value', NULL)"
    }
  }
}
}
return OK
$body$
LANGUAGE 'pltcl' VOLATILE CALLED ON NULL INPUT SECURITY INVOKER;

The function is a bit long but should be easy to understand. First, we get the name of the current user and table, then we get the PK name with a query to the system catalogs and finally we log the event to the audit_table. Depending on the event type (INSERT, UPDATE or DELETE), the function loops through the table fields, checks which one(s) changed, and stores a record into the audit_table for each changed field.

Create a TRIGGER for each table to monitor

Now that we have defined a generic function that should work with any table (with the limitation of it having a PK on one field only), we can "attach" it to a table with a simple trigger:

--DROP TRIGGER tg_audit_test1 ON test1;
CREATE TRIGGER "tg_audit_test1"
AFTER INSERT OR DELETE OR UPDATE ON "test1"
FOR EACH ROW
EXECUTE PROCEDURE "log_to_audit_table" ();
 
--DROP TRIGGER tg_audit_test2 ON test2;
CREATE TRIGGER "tg_audit_test2"
AFTER INSERT OR DELETE OR UPDATE ON "test2"
FOR EACH ROW
EXECUTE PROCEDURE "log_to_audit_table" ();

Just change the trigger name and the table name, no other change is needed.

Test the auditing procedure

Now that we've prepared everything, let's see how it works by inserting a record into a monitored table:

INSERT INTO test1 (id, somename, somevalue, somets) VALUES (1, 'php', 5, CURRENT_TIMESTAMP);

Let's see what happened to the audit table:

testdb=# SELECT * FROM audit_table;
           ts               |   usr    |  tbl  |    fld    | pk_name | pk_value | mod_type | old_val |         new_val
----------------------------+----------+-------+-----------+---------+----------+----------+---------+---------------------------
 2007-02-19 23:18:51.468+01 | postgres | test1 |           | id      | 1        | INSERT   |         |
 2007-02-19 23:18:51.468+01 | postgres | test1 | somename  | id      | 1        | INSERT   |         | php
 2007-02-19 23:18:51.468+01 | postgres | test1 | somevalue | id      | 1        | INSERT   |         | 5
 2007-02-19 23:18:51.468+01 | postgres | test1 | somets    | id      | 1        | INSERT   |         | 2007-02-19 23:18:51.468+01
(4 rows)

The function saved into the audit_table the following information:

  • the timestamp of the event
  • the user who caused the event
  • the names of the table and field involved
  • the name and value of the primary key of the affected record
  • the event type (INSERT | UPDATE | DELETE)
  • the old and new values of the field

Let's insert a record in the other table:

testdb=# INSERT INTO test2 (pkid, testname, testvalue, testdate) VALUES (1, 'php', 5.21, CURRENT_DATE);
INSERT 1
testdb=# SELECT * FROM audit_table;
           ts               |   usr    |  tbl  |    fld    | pk_name | pk_value | mod_type | old_val |         new_val
----------------------------+----------+-------+-----------+---------+----------+----------+---------+---------------------------
 2007-02-19 23:26:17.437+01 | postgres | test1 |           | id      | 1        | INSERT   |         |
 2007-02-19 23:26:17.437+01 | postgres | test1 | somename  | id      | 1        | INSERT   |         | php
 2007-02-19 23:26:17.437+01 | postgres | test1 | somevalue | id      | 1        | INSERT   |         | 5
 2007-02-19 23:26:17.437+01 | postgres | test1 | somets    | id      | 1        | INSERT   |         | 2007-02-19 23:26:17.437+01
 2007-02-19 23:28:09.656+01 | postgres | test2 |           | pkid    | 1        | INSERT   |         |
 2007-02-19 23:28:09.656+01 | postgres | test2 | testname  | pkid    | 1        | INSERT   |         | php
 2007-02-19 23:28:09.656+01 | postgres | test2 | testvalue | pkid    | 1        | INSERT   |         | 5.21
 2007-02-19 23:28:09.656+01 | postgres | test2 | testdate  | pkid    | 1        | INSERT   |         | 2007-02-19
(8 rows)

The audit_table registered the insertion, with one record per field. Please note that it recognized the event happened on another table, and automatically retrieved the field names and values.
What if we update only one field?

testdb=# UPDATE test1 SET somevalue=4;
UPDATE 1
testdb=# SELECT * FROM audit_table;
           ts               |   usr    |  tbl  |    fld    | pk_name | pk_value | mod_type | old_val |         new_val
----------------------------+----------+-------+-----------+---------+----------+----------+---------+---------------------------
 2007-02-19 23:26:17.437+01 | postgres | test1 |           | id      | 1        | INSERT   |         |
 2007-02-19 23:26:17.437+01 | postgres | test1 | somename  | id      | 1        | INSERT   |         | php
 2007-02-19 23:26:17.437+01 | postgres | test1 | somevalue | id      | 1        | INSERT   |         | 5
 2007-02-19 23:26:17.437+01 | postgres | test1 | somets    | id      | 1        | INSERT   |         | 2007-02-19 23:26:17.437+01
 2007-02-19 23:28:09.656+01 | postgres | test2 |           | pkid    | 1        | INSERT   |         |
 2007-02-19 23:28:09.656+01 | postgres | test2 | testname  | pkid    | 1        | INSERT   |         | php
 2007-02-19 23:28:09.656+01 | postgres | test2 | testvalue | pkid    | 1        | INSERT   |         | 5.21
 2007-02-19 23:28:09.656+01 | postgres | test2 | testdate  | pkid    | 1        | INSERT   |         | 2007-02-19
 2007-02-19 23:30:59.140+01 | postgres | test1 | somevalue | id      | 1        | UPDATE   | 5       | 4
(9 rows)

This time we have just one new record, since only one field changed its contents. The function correctly saved both the old and the new value. As a last test, let's delete a row:

testdb=# DELETE FROM test2;
DELETE 1
testdb=# SELECT * FROM audit_table;
           ts               |   usr    |  tbl  |    fld    | pk_name | pk_value | mod_type |  old_val   |         new_val
----------------------------+----------+-------+-----------+---------+----------+----------+------------+---------------------------
 2007-02-19 23:26:17.437+01 | postgres | test1 |           | id      | 1        | INSERT   |            |
 2007-02-19 23:26:17.437+01 | postgres | test1 | somename  | id      | 1        | INSERT   |            | php
 2007-02-19 23:26:17.437+01 | postgres | test1 | somevalue | id      | 1        | INSERT   |            | 5
 2007-02-19 23:26:17.437+01 | postgres | test1 | somets    | id      | 1        | INSERT   |            | 2007-02-19 23:26:17.437+01
 2007-02-19 23:28:09.656+01 | postgres | test2 |           | pkid    | 1        | INSERT   |            |
 2007-02-19 23:28:09.656+01 | postgres | test2 | testname  | pkid    | 1        | INSERT   |            | php
 2007-02-19 23:28:09.656+01 | postgres | test2 | testvalue | pkid    | 1        | INSERT   |            | 5.21
 2007-02-19 23:28:09.656+01 | postgres | test2 | testdate  | pkid    | 1        | INSERT   |            | 2007-02-19
 2007-02-19 23:30:59.140+01 | postgres | test1 | somevalue | id      | 1        | UPDATE   | 5          | 4
 2007-02-19 23:34:04.125+01 | postgres | test2 |           | pkid    | 1        | DELETE   |            |
 2007-02-19 23:34:04.125+01 | postgres | test2 | testname  | pkid    | 1        | DELETE   | php        |
 2007-02-19 23:34:04.125+01 | postgres | test2 | testvalue | pkid    | 1        | DELETE   | 5.21       |
 2007-02-19 23:34:04.125+01 | postgres | test2 | testdate  | pkid    | 1        | DELETE   | 2007-02-19 |
(13 rows)

Again, the event was duly logged into audit_table.
Download the code

Conclusion

In this tutorial we've seen how to write a PostgreSQL function in PL/TCL and a trigger to attach the function to some events on other tables. As you can guess, the function as-is isn't perfect nor bullet-proof: you can't monitor tables with multiple-field PKs or LOBs, and the audit_table can become huge in no time. Still, you may find it useful, if nothing else for learning purposes.

Till next time, take care.

Update

It looks like someone found this tutorial useful, and extended it to solve a few problems of the original solution. I recommend you to have a look at this blog entry too.

Update 2010-07-11

Davor Josipovic implemented the function in plpgsql, and supporting multiple PK columns. Check it out!



4 responses to "PostgreSQL table audit"

Hi Lorenzo,
Thank you very much for your contribution. I want to add a remark concerning line 4. I had some issues with tables which were spatially enabled with PostGIS. The relfilenode was unknown for $TG_relid. Actually, the values in columns "OID" are always the same as in the column "relfilenode" a part for spatially enabled tables. Hence, I changed the code to compare with OID and the function works properly. Did you have a particular reason to compare the variable $TG_relid with the column relfilenode?
I am curious. Hope my remark is of interest to you.
Best regards,
Quirin
_________________________________
Albert-Ludwigs-Universität Freiburg
Department of Microsystems Engineering
Georges-Köhler-Allee 103
79110 Freiburg - Germany
Tel: +49761/203-7158
Fax: +49761/203-7222
e-mail: quirin.hamp at imtek.uni-freiburg.de
Internet: http://www.imtek.uni-freiburg.de/emp

many thanks for such a great head start on pltcl triggers
in my version of pgsql8.4 it proved necessary to use set tgname $TG_table_name
rather than
spi_exec \"SELECT relname AS tgname FROM pg_class WHERE relfilenode = $TG_relid\"
it works well with that one change although does not seem to handle tables with boolean fields. The update from jimmyorg\'s blog also does not handle it (at least for me).

Hey, thank you so much for this tutorial.
I would like ask you something.
For each row the trigger insert a new line on audit table. How I can do to make the trigger insert only one new line for the same transaction?
Or if it isn\'t possible...how I could get the primary key value inside a NEW or OLD record? It\'s possible?
Thank you so much.

@Diemesleno
You can declare a trigger for each statement instead of each row CREATE TRIGGER \"tg_audit_test1\" AFTER INSERT OR DELETE OR UPDATE ON \"test1\" {FOR EACH ROW} -> FOR EACH STATEMENT
About the primary key value if you look closer, it\'s in his code. If you\'re asking for pure plSQL statement, just execute EXECUTE \'INSERT INTO \"Audit\" SELECT nextval(\'\'\"Audit_id_seq\"\'\'::regclass), now(),($1).\"lastModifiedByUserId\",\' ||quote_literal(TG_TABLE_NAME) ||\',\'||quote_literal(cname)||\',\'||quote_literal(pk_name)||\',($1).\'||quote_ident(pk_name)||\',\'\'U\'\',($1).\'||quote_ident(cname)||\'::text, ($2).\' || quote_ident(ri.fld) ||\' WHERE ( $1).\' || quote_ident(cname) || \' <> ($2).\' || quote_ident(cname) || \' ;\' USING OLD,NEW;

Lorenzo Alberton

Lorenzo Alberton Lorenzo PHP5 ZCE - Zend Certified Engineer has been working with large enterprise UK companies for the past years and is now Chief Tech Architect at DataSift. He's an international conference speaker and a long-time contributor to many open source projects. Lorenzo Alberton's profile on LinkedIN View Lorenzo Alberton's Twitter stream

Lorenzo Alberton - Sun Certified MySQL 5 Developer

Tags

AJAX, Apache, Book Review, Charset, Cheat Sheet, Data structures, Database, Firebird SQL, Hadoop, Imagick, INFORMATION_SCHEMA, JavaScript, Kafka, Linux, Message Queues, mod_rewrite, Monitoring, MySQL, NoSQL, Oracle, PDO, PEAR, Performance, PHP, PostgreSQL, Profiling, Scalability, Security, SPL, SQL Server, SQLite, Testing, Tutorial, TYPO3, Windows, Zend Framework

Buy me a book - Joel On Software